Commits · 810d831bbbf3cbd86e5aa91c8485b4d35186144d · Kirill Smelkov / linux

22 May, 2024 13 commits

virtio_balloon: Give the balloon its own wakeup source · 810d831b

David Stevens authored Mar 21, 2024

Wakeup sources don't support nesting multiple events, so sharing a
single object between multiple drivers can result in one driver
overriding the wakeup event processing period specified by another
driver. Have the virtio balloon driver use the wakeup source of the
device it is bound to rather than the wakeup source of the parent
device, to avoid conflicts with the transport layer.

Note that although the virtio balloon's virtio_device itself isn't what
actually wakes up the device, it is responsible for processing wakeup
events. In the same way that EPOLLWAKEUP uses a dedicated wakeup_source
to prevent suspend when userspace is processing wakeup events, a
dedicated wakeup_source is necessary when processing wakeup events in a
higher layer in the kernel.

Fixes: b12fbc3f ("virtio_balloon: stay awake while adjusting balloon")
Signed-off-by: David Stevens <stevensd@chromium.org>
Acked-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240321012445.1593685-2-stevensd@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

810d831b

virtio-mem: support suspend+resume · e4544c55

David Hildenbrand authored Mar 18, 2024

With virtio-mem, primarily hibernation is problematic: as the machine shuts
down, the virtio-mem device loses its state. Powering the machine back up
is like losing a bunch of DIMMs. While there would be ways to add limited
support, suspend+resume is more commonly used for VMs and "easier" to
support cleanly.

s2idle can be supported without any device dependencies. Similarly, one
would expect suspend-to-ram (i.e., S3) to work out of the box. However,
QEMU currently unplugs all device memory when resuming the VM, using a
cold reset on the "wakeup" path. In order to support S3, we need a feature
flag for the device to tell us if memory remains plugged when waking up. In
the future, QEMU will implement this feature.

So let's always support s2idle and support S3 with plugged memory only if
the device indicates support. Block hibernation early using the PM
notifier.

Trying to hibernate now fails early:
	# echo disk > /sys/power/state
	[   26.455369] PM: hibernation: hibernation entry
	[   26.458271] virtio_mem virtio0: hibernation is not supported.
	[   26.462498] PM: hibernation: hibernation exit
	-bash: echo: write error: Operation not permitted

s2idle works even without the new feature bit:
	# echo s2idle > /sys/power/mem_sleep
	# echo mem > /sys/power/state
	[   52.083725] PM: suspend entry (s2idle)
	[   52.095950] Filesystems sync: 0.010 seconds
	[   52.101493] Freezing user space processes
	[   52.104213] Freezing user space processes completed (elapsed 0.001 seconds)
	[   52.106520] OOM killer disabled.
	[   52.107655] Freezing remaining freezable tasks
	[   52.110880] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
	[   52.113296] printk: Suspending console(s) (use no_console_suspend to debug)

S3 does not work without the feature bit when memory is plugged:
	# echo deep > /sys/power/mem_sleep
	# echo mem > /sys/power/state
	[   32.788281] PM: suspend entry (deep)
	[   32.816630] Filesystems sync: 0.027 seconds
	[   32.820029] Freezing user space processes
	[   32.823870] Freezing user space processes completed (elapsed 0.001 seconds)
	[   32.827756] OOM killer disabled.
	[   32.829608] Freezing remaining freezable tasks
	[   32.833842] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
	[   32.837953] printk: Suspending console(s) (use no_console_suspend to debug)
	[   32.916172] virtio_mem virtio0: suspend+resume with plugged memory is not supported
	[   32.916181] virtio-pci 0000:00:02.0: PM: pci_pm_suspend(): virtio_pci_freeze+0x0/0x50 returns -1
	[   32.916197] virtio-pci 0000:00:02.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -1
	[   32.916210] virtio-pci 0000:00:02.0: PM: failed to suspend async: error -1

But S3 works with the new feature bit when memory is plugged (patched
QEMU):
	# echo deep > /sys/power/mem_sleep
	# echo mem > /sys/power/state
	[   33.983694] PM: suspend entry (deep)
	[   34.009828] Filesystems sync: 0.024 seconds
	[   34.013589] Freezing user space processes
	[   34.016722] Freezing user space processes completed (elapsed 0.001 seconds)
	[   34.019092] OOM killer disabled.
	[   34.020291] Freezing remaining freezable tasks
	[   34.023549] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
	[   34.026090] printk: Suspending console(s) (use no_console_suspend to debug)

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240318120645.105664-1-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

e4544c55

kernel: Remove signal hacks for vhost_tasks · 240a1853

Mike Christie authored Mar 15, 2024

This removes the signal/coredump hacks added for vhost_tasks in:

Commit f9010dbd ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")

When that patch was added vhost_tasks did not handle SIGKILL and would
try to ignore/clear the signal and continue on until the device's close
function was called. In the previous patches vhost_tasks and the vhost
drivers were converted to support SIGKILL by cleaning themselves up and
exiting. The hacks are no longer needed so this removes them.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-10-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

240a1853

vhost_task: Handle SIGKILL by flushing work and exiting · db5247d9

Mike Christie authored Mar 15, 2024

Instead of lingering until the device is closed, this has us handle
SIGKILL by:

1. marking the worker as killed so we no longer try to use it with
   new virtqueues and new flush operations.
2. setting the virtqueue to worker mapping so no new works are queued.
3. running all the exiting works.
Suggested-by: Edward Adam Davis <eadavis@qq.com>
Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com
Message-Id: <tencent_546DA49414E876EEBECF2C78D26D242EE50A@qq.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-9-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

db5247d9

vhost: Release worker mutex during flushes · ba704ff4

Mike Christie authored Mar 15, 2024

In the next patches where the worker can be killed while in use, we
need to be able to take the worker mutex and kill queued works for
new IO and flushes, and set some new flags to prevent new
__vhost_vq_attach_worker calls from swapping in/out killed workers.

If we are holding the worker mutex during a flush and the flush's work
is still in the queue, the worker code that will handle the SIGKILL
cleanup won't be able to take the mutex and perform it's cleanup. So
this patch has us drop the worker mutex while waiting for the flush
to complete.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-8-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

ba704ff4

vhost: Use virtqueue mutex for swapping worker · 34cf9ba5

Mike Christie authored Mar 15, 2024

__vhost_vq_attach_worker uses the vhost_dev mutex to serialize the
swapping of a virtqueue's worker. This was done for simplicity because
we are already holding that mutex.

In the next patches where the worker can be killed while in use, we need
finer grained locking because some drivers will hold the vhost_dev mutex
while flushing. However in the SIGKILL handler in the next patches, we
will need to be able to swap workers (set current one to NULL), kill
queued works and stop new flushes while flushes are in progress.

To prepare us, this has us use the virtqueue mutex for swapping workers
instead of the vhost_dev one.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-7-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

34cf9ba5

vhost_scsi: Handle vhost_vq_work_queue failures for TMFs · 0352c961

Mike Christie authored Mar 15, 2024

vhost_vq_work_queue will never fail when queueing the TMF's response
handling because a guest can only send us TMFs when the device is fully
setup so there is always a worker at that time. In the next patches we
will modify the worker code so it handles SIGKILL by exiting before
outstanding commands/TMFs have sent their responses. In that case
vhost_vq_work_queue can fail when we try to send a response.

This has us just free the TMF's resources since at this time the guest
won't be able to get a response even if we could send it.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-6-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

0352c961

vhost: Remove vhost_vq_flush · d9e59eec

Mike Christie authored Mar 15, 2024

vhost_vq_flush is no longer used so remove it.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-5-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

d9e59eec

vhost-scsi: Use system wq to flush dev for TMFs · 59b701b9

Mike Christie authored Mar 15, 2024

We flush all the workers that are not also used by the ctl vq to make
sure that responses queued by LIO before the TMF response are sent
before the TMF response. This requires a special vhost_vq_flush
function which, in the next patches where we handle SIGKILL killing
workers while in use, will require extra locking/complexity. To avoid
that, this patch has us flush the entire device from the system work
queue, then queue up sending the response from there.

This is a little less optimal since we now flush all workers but this
will be ok since commands have already timed out and perf is not a
concern.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-4-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

59b701b9

vhost-scsi: Handle vhost_vq_work_queue failures for cmds · 1eceddee

Mike Christie authored Mar 15, 2024

In the next patches we will support the vhost_task being killed while in
use. The problem for vhost-scsi is that we can't free some structs until
we get responses for commands we have submitted to the target layer and
we currently process the responses from the vhost_task.

This has just drop the responses and free the command's resources. When
all commands have completed then operations like flush will be woken up
and we can complete device release and endpoint cleanup.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-3-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

1eceddee

vhost-scsi: Handle vhost_vq_work_queue failures for events · b1b2ce58

Mike Christie authored Mar 15, 2024

Currently, we can try to queue an event's work before the vhost_task is
created. When this happens we just drop it in vhost_scsi_do_plug before
even calling vhost_vq_work_queue. During a device shutdown we do the
same thing after vhost_scsi_clear_endpoint has cleared the backends.

In the next patches we will be able to kill the vhost_task before we
have cleared the endpoint. In that case, vhost_vq_work_queue can fail
and we will leak the event's memory. This has handle the failure by
just freeing the event. This is safe to do, because
vhost_vq_work_queue will only return failure for us when the vhost_task
is killed and so userspace will not be able to handle events if we
sent them.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-2-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

b1b2ce58

vdpa: Convert sprintf/snprintf to sysfs_emit · 7b1b5c7f

Li Zhijian authored Mar 14, 2024

Per filesystems/sysfs.rst, show() should only use sysfs_emit()
or sysfs_emit_at() when formatting the value to be returned to user space.

coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().

sprintf() will be converted as weel if they have.

Generally, this patch is generated by
make coccicheck M=<path/to/file> MODE=patch \
COCCI=scripts/coccinelle/api/device_attr_show.cocci

No functional change intended

CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
CC: virtualization@lists.linux.dev
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Message-Id: <20240314095853.1326111-1-lizhijian@fujitsu.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

7b1b5c7f

vp_vdpa: Fix return value check vp_vdpa_request_irq · f181a373

Yuxue Liu authored Mar 25, 2024

In the vp_vdpa_set_status function, when setting the device status to
VIRTIO_CONFIG_S_DRIVER_OK, the vp_vdpa_request_irq function may fail.
In such cases, the device status should not be set to DRIVER_OK. Add
exception printing to remind the user.
Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com>
Message-Id: <20240325105448.235-1-gavin.liu@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

f181a373

01 May, 2024 2 commits

virtio-mmio: Convert to platform remove callback returning void · e117d9b6

Uwe Kleine-König authored Mar 08, 2024

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Message-Id: <ef71f955531d5e41b20d801e1149bb08d155679a.1709886922.git.u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

e117d9b6

MAINTAINERS: apply maintainer role of Intel vDPA driver · 4d556931

Zhu Lingshan authored Feb 27, 2024

This commit applies maintainer role of Intel vDPA
driver for myself.

I am the author of this driver and have been contributing to
it for long time, I would like to help this solution evolve
in future.

This driver is still under virtio maintenance.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240227144519.555554-1-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

4d556931

30 Apr, 2024 5 commits

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 18daea77

Linus Torvalds authored Apr 30, 2024

Pull kvm fix from Paolo Bonzini:
 "A pretty straightforward fix for a NULL pointer dereference, plus the
  accompanying reproducer"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Add test for uaccesses to non-existent vgic-v2 CPUIF
  KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()

18daea77

Merge tag 'kvmarm-fixes-6.9-2' of... · 16c20208

Paolo Bonzini authored Apr 30, 2024

Merge tag 'kvmarm-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.9, part #2

- Fix + test for a NULL dereference resulting from unsanitised user
  input in the vgic-v2 device attribute accessors

16c20208

Merge tag 'for-v6.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · 50dffbf7

Linus Torvalds authored Apr 30, 2024

Pull power supply fixes from Sebastian Reichel:

 - mt6360_charger: Fix of_match for usb-otg-vbus regulator

 - rt9455: Fix unused-const-variable for !CONFIG_USB_PHY

* tag 'for-v6.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: mt6360_charger: Fix of_match for usb-otg-vbus regulator
  power: rt9455: hide unused rt9455_boost_voltage_values

50dffbf7

Merge tag 'platform-drivers-x86-v6.9-4' of... · a52a0b39

Linus Torvalds authored Apr 30, 2024

Merge tag 'platform-drivers-x86-v6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fix from Ilpo Järvinen:

 - Add Grand Ridge to HPM CPU list

* tag 'platform-drivers-x86-v6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ISST: Add Grand Ridge to HPM CPU list

a52a0b39

Merge tag 'pinctrl-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · e5c8fc59

Linus Torvalds authored Apr 30, 2024

Pull pin control fixes from Linus Walleij:

 - Fix a double-free in the pinctrl_enable() errorpath

 - Fix a refcount leak in pinctrl_dt_to_map()

 - Fix selecting the GPIO pin control state and the UART3 pin config
   group in the Intel Baytrail driver

 - Fix readback of schmitt trigger status in the Mediatek Paris driver,
   along with some semantic pin config issues in this driver

 - Fix a pin suffix typo in the Meson A1 driver

 - Fix an erroneous register offset in he Aspeed G6 driver

 - Fix an inconsistent lock state and the interrupt type on resume in
   the Renesas RZG2L driver

 - Fix some minor confusion in the Renesas DT bindings

* tag 'pinctrl-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: renesas: rzg2l: Configure the interrupt type on resume
  pinctrl: devicetree: fix refcount leak in pinctrl_dt_to_map()
  pinctrl: baytrail: Add pinconf group for uart3
  pinctrl: baytrail: Fix selecting gpio pinctrl state
  pinctrl: mediatek: paris: Rework support for PIN_CONFIG_{INPUT,OUTPUT}_ENABLE
  pinctrl: mediatek: paris: Fix PIN_CONFIG_INPUT_SCHMITT_ENABLE readback
  pinctrl: core: delete incorrect free in pinctrl_enable()
  pinctrl/meson: fix typo in PDM's pin name
  pinctrl: pinctrl-aspeed-g6: Fix register offset for pinconf of GPIOR-T
  pinctrl: renesas: rzg2l: Execute atomically the interrupt configuration
  dt-bindings: pinctrl: renesas,rzg2l-pinctrl: Allow 'input' and 'output-enable' properties

e5c8fc59

29 Apr, 2024 11 commits

Merge tag 'wq-for-6.9-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 98369dcc

Linus Torvalds authored Apr 29, 2024

Pull workqueue fixes from Tejun Heo:
 "Two doc update patches and the following three fixes:

   - On single node systems, the default pool is used but the
     node_nr_active for the default pool was set to min_active. This
     effectively limited the max concurrency of unbound pools on single
     node systems to 8 causing performance regressions on some
     workloads. Fixed by setting the default pool's node_nr_active to
     max_active.

   - wq_update_node_max_active() could trigger divide-by-zero if the
     intersection between the allowed CPUs for an unbound workqueue and
     online CPUs becomes empty.

   - When kick_pool() was trying to repatriate a worker to a CPU in its
     pod by setting task->wake_cpu, it didn't consider whether the CPU
     being selected is online or not which obviously can lead to
     subobtimal behaviors. On s390, this triggered a crash in arch code.
     The workqueue patch removes the gross misbehavior but doesn't fix
     the crash completely as there's a race window in which CPUs can go
     down after wake_cpu is set. Need to decide whether the fix should
     be on the core or arch side"

* tag 'wq-for-6.9-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix divide error in wq_update_node_max_active()
  workqueue: The default node_nr_active should have its max set to max_active
  workqueue: Fix selection of wake_cpu in kick_pool()
  docs/zh_CN: core-api: Update translation of workqueue.rst to 6.9-rc1
  Documentation/core-api: Update events_freezable_power references.

98369dcc

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · d03d4188

Linus Torvalds authored Apr 29, 2024

Pull SCSI fix from James Bottomley:
 "Minor core fix to prevent the sd driver printing the stream count
  every time we rescan and instead print only if it's changed"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Only print updates to permanent stream count

d03d4188

Merge tag 'nfsd-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · a91bae87

Linus Torvalds authored Apr 29, 2024

Pull nfsd fix from Chuck Lever:

 - Avoid freeing unallocated memory (v6.7 regression)

* tag 'nfsd-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix nfsd4_encode_fattr4() crasher

a91bae87

Merge tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 9e4bc4bc

Linus Torvalds authored Apr 29, 2024

Pull NFS client fixes from Trond Myklebust:

 - Fix an Oops in xs_tcp_tls_setup_socket

 - Fix an Oops due to missing error handling in nfs_net_init()

* tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: Handle error of rpc_proc_register() in nfs_net_init().
  SUNRPC: add a missing rpc_stat for TCP TLS

9e4bc4bc

Merge tag 'bcachefs-2024-04-29' of https://evilpiepirate.org/git/bcachefs · 0a2e2305

Linus Torvalds authored Apr 29, 2024

Pull bcachefs fixes from Kent Overstreet:
 "Tiny set of fixes this time"

* tag 'bcachefs-2024-04-29' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: fix integer conversion bug
  bcachefs: btree node scan now fills in sectors_written
  bcachefs: Remove accidental debug assert

0a2e2305

Merge tag 'erofs-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · b947cc5b

Linus Torvalds authored Apr 29, 2024

Pull erofs fixes from Gao Xiang:
 "Three fixes related to EROFS fscache mode. The most important two
  patches fix calling kill_block_super() in bdev-based mode instead of
  kill_anon_super(). The remaining patch is an informative one.

  Summary:

   - Better error message when prepare_ondemand_read failed

   - Fix unmount of bdev-based mode if CONFIG_EROFS_FS_ONDEMAND is on"

* tag 'erofs-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: reliably distinguish block based and fscache mode
  erofs: get rid of erofs_fs_context
  erofs: modify the error message when prepare_ondemand_read failed

b947cc5b

bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS · 5af385f5

Matthew Wilcox (Oracle) authored Apr 29, 2024

bits_per() rounds up to the next power of two when passed a power of
two.  This causes crashes on some machines and configurations.
Reported-by: Михаил Новоселов <m.novosyolov@rosalinux.ru>
Tested-by: Ильфат Гаптрахманов <i.gaptrakhmanov@rosalinux.ru>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3347
Link: https://lore.kernel.org/all/1c978cf1-2934-4e66-e4b3-e81b04cb3571@rosalinux.ru/
Fixes: f2d5dcb4 (bounds: support non-power-of-two CONFIG_NR_CPUS)
Cc:  <stable@vger.kernel.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5af385f5

platform/x86: ISST: Add Grand Ridge to HPM CPU list · 515a3c3a

Srinivas Pandruvada authored Apr 22, 2024

Add Grand Ridge (ATOM_CRESTMONT) to hpm_cpu_ids, so that MSR 0x54 can be
used.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240422212222.3881606-1-srinivas.pandruvada@linux.intel.comReviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

515a3c3a

bcachefs: fix integer conversion bug · c258c08a
Kent Overstreet authored Apr 25, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
c258c08a
bcachefs: btree node scan now fills in sectors_written · f7c3dc26
Kent Overstreet authored Apr 25, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
f7c3dc26
bcachefs: Remove accidental debug assert · ae927653
Kent Overstreet authored Apr 22, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
ae927653

28 Apr, 2024 9 commits

Linux 6.9-rc6 · e67572cd
Linus Torvalds authored Apr 28, 2024

e67572cd

Merge tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 245c8e81

Linus Torvalds authored Apr 28, 2024

Pull scheduler fixes from Ingo Molnar:

 - Fix EEVDF corner cases

 - Fix two nohz_full= related bugs that can cause boot crashes
   and warnings

* tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
  sched/isolation: Prevent boot crash when the boot CPU is nohz_full
  sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()
  sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr
  sched/eevdf: Always update V if se->on_rq when reweighting

245c8e81

Merge tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · aec147c1

Linus Torvalds authored Apr 28, 2024

Pull x86 fixes from Ingo Molnar:

 - Make the CPU_MITIGATIONS=n interaction with conflicting
   mitigation-enabling boot parameters a bit saner.

 - Re-enable CPU mitigations by default on non-x86

 - Fix TDX shared bit propagation on mprotect()

 - Fix potential show_regs() system hang when PKE initialization
   is not fully finished yet.

 - Add the 0x10-0x1f model IDs to the Zen5 range

 - Harden #VC instruction emulation some more

* tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
  cpu: Re-enable CPU mitigations by default for !X86 architectures
  x86/tdx: Preserve shared bit on mprotect()
  x86/cpu: Fix check for RDPKRU in __show_regs()
  x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
  x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler

aec147c1

Merge tag 'irq-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8d62e9bf

Linus Torvalds authored Apr 28, 2024

Pull irq fix from Ingo Molnar:
 "Fix a double free bug in the init error path of the GICv3 irqchip
  driver"

* tag 'irq-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Prevent double free on error

8d62e9bf

erofs: reliably distinguish block based and fscache mode · 7af2ae1b

Christian Brauner authored Apr 19, 2024

When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:

============================================
ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
 <TASK>
 erofs_kill_sb+0x81/0x90
 deactivate_locked_super+0x35/0x80
 get_tree_bdev+0x136/0x1e0
 vfs_get_tree+0x2c/0xf0
 do_new_mount+0x190/0x2f0
 [...]
============================================

Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaokun1@huawei.comSigned-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

7af2ae1b

erofs: get rid of erofs_fs_context · 07abe43a

Baokun Li authored Apr 19, 2024

Instead of allocating the erofs_sb_info in fill_super() allocate it during
erofs_init_fs_context() and ensure that erofs can always have the info
available during erofs_kill_sb(). After this erofs_fs_context is no longer
needed, replace ctx with sbi, no functional changes.
Suggested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240419123611.947084-2-libaokun1@huawei.comSigned-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

07abe43a

erofs: modify the error message when prepare_ondemand_read failed · 17597b1e

Hongbo Li authored Apr 24, 2024

When prepare_ondemand_read failed, wrong error message is printed.
The prepare_read is also implemented in cachefiles, so we amend it.
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240424084247.759432-1-lihongbo22@huawei.comSigned-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

17597b1e

sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU · 257bf89d

Oleg Nesterov authored Apr 13, 2024

housekeeping_setup() checks cpumask_intersects(present, online) to ensure
that the kernel will have at least one housekeeping CPU after smp_init(),
but this doesn't work if the maxcpus= kernel parameter limits the number of
processors available after bootup.

For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at
boot time on a virtual machine with 4 CPUs.

Change housekeeping_setup() to use cpumask_first_and() and check that the
returned CPU number is valid and less than setup_max_cpus.

Another corner case is "nohz_full=0" on a machine with a single CPU or with
the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty
and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the
WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().

And how should the kernel interpret the "nohz_full=" parameter? It should
be silently ignored, but currently cpulist_parse() happily returns the
empty cpumask and this leads to the same problem.

Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
and do nothing in this case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240413141746.GA10008@redhat.com

257bf89d

sched/isolation: Prevent boot crash when the boot CPU is nohz_full · 5097cbcb

Oleg Nesterov authored Apr 11, 2024

Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, which is no longer true after:

  08ae95f4 ("nohz_full: Allow the boot CPU to be nohz_full").

However after:

  aae17ebb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")

the kernel will crash at boot time in this case; housekeeping_any_cpu()
returns an invalid CPU number until smp_init() brings the first
housekeeping CPU up.

Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case.

This is just the simple and backportable workaround which fixes the
symptom, but smp_processor_id() at boot time should be safe at least for
type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
logic.

There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
offline tick_do_timer_cpu (the 1st online housekeeping CPU).

Fixes: aae17ebb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com
Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/

5097cbcb