Commits · 9836e93c0a7e031ac6a71c56171c229de1eea7cf · Kirill Smelkov / linux

23 May, 2022 9 commits

Merge tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block · 9836e93c

Linus Torvalds authored May 23, 2022

Pull io_uring NVMe command passthrough from Jens Axboe:
 "On top of everything else, this adds support for passthrough for
  io_uring.

  The initial feature for this is NVMe passthrough support, which allows
  non-filesystem based IO commands and admin commands.

  To support this, io_uring grows support for SQE and CQE members that
  are twice as big, allowing to pass in a full NVMe command without
  having to copy data around. And to complete with more than just a
  single 32-bit value as the output"

* tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits)
  io_uring: cleanup handling of the two task_work lists
  nvme: enable uring-passthrough for admin commands
  nvme: helper for uring-passthrough checks
  blk-mq: fix passthrough plugging
  nvme: add vectored-io support for uring-cmd
  nvme: wire-up uring-cmd support for io-passthru on char-device.
  nvme: refactor nvme_submit_user_cmd()
  block: wire-up support for passthrough plugging
  fs,io_uring: add infrastructure for uring-cmd
  io_uring: support CQE32 for nop operation
  io_uring: enable CQE32
  io_uring: support CQE32 in /proc info
  io_uring: add tracing for additional CQE32 fields
  io_uring: overflow processing for CQE32
  io_uring: flush completions for CQE32
  io_uring: modify io_get_cqe for CQE32
  io_uring: add CQE32 completion processing
  io_uring: add CQE32 setup processing
  io_uring: change ring size calculation for CQE32
  io_uring: store add. return values for CQE32
  ...

9836e93c

Merge tag 'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-block · e1a8fde7

Linus Torvalds authored May 23, 2022

Pull io_uring 'more data in socket' support from Jens Axboe:
 "To be able to fully utilize the 'poll first' support in the core
  io_uring branch, it's advantageous knowing if the socket was empty
  after a receive. This adds support for that"

* tag 'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: return hint on whether more data is available after receive
  tcp: pass back data left in socket after receive

e1a8fde7

Merge tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block · 368da430

Linus Torvalds authored May 23, 2022

Pull io_uring socket() support from Jens Axboe:
 "This adds support for socket(2) for io_uring. This is handy when using
  direct / registered file descriptors with io_uring.

  Outside of those two patches, a small series from Dylan on top that
  improves the tracing by providing a text representation of the opcode
  rather than needing to decode this by reading the header file every
  time.

  That sits in this branch as it was the last opcode added (until it
  wasn't...)"

* tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()

368da430

Merge tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block · 09beaff7

Linus Torvalds authored May 23, 2022

Pull io_uring xattr support from Jens Axboe:
 "Support for the xattr variants"

* tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: cleanup error-handling around io_req_complete
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr

09beaff7

Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block · 3a166bdb

Linus Torvalds authored May 23, 2022

Pull io_uring updates from Jens Axboe:
 "Here are the main io_uring changes for 5.19. This contains:

   - Fixes for sparse type warnings (Christoph, Vasily)

   - Support for multi-shot accept (Hao)

   - Support for io_uring managed fixed files, rather than always
     needing the applicationt o manage the indices (me)

   - Fix for a spurious poll wakeup (Dylan)

   - CQE overflow fixes (Dylan)

   - Support more types of cancelations (me)

   - Support for co-operative task_work signaling, rather than always
     forcing an IPI (me)

   - Support for doing poll first when appropriate, rather than always
     attempting a transfer first (me)

   - Provided buffer cleanups and support for mapped buffers (me)

   - Improve how io_uring handles inflight SCM files (Pavel)

   - Speedups for registered files (Pavel, me)

   - Organize the completion data in a struct in io_kiocb rather than
     keep it in separate spots (Pavel)

   - task_work improvements (Pavel)

   - Cleanup and optimize the submission path, in general and for
     handling links (Pavel)

   - Speedups for registered resource handling (Pavel)

   - Support sparse buffers and file maps (Pavel, me)

   - Various fixes and cleanups (Almog, Pavel, me)"

* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
  io_uring: fix incorrect __kernel_rwf_t cast
  io_uring: disallow mixed provided buffer group registrations
  io_uring: initialize io_buffer_list head when shared ring is unregistered
  io_uring: add fully sparse buffer registration
  io_uring: use rcu_dereference in io_close
  io_uring: consistently use the EPOLL* defines
  io_uring: make apoll_events a __poll_t
  io_uring: drop a spurious inline on a forward declaration
  io_uring: don't use ERR_PTR for user pointers
  io_uring: use a rwf_t for io_rw.flags
  io_uring: add support for ring mapped supplied buffers
  io_uring: add io_pin_pages() helper
  io_uring: add buffer selection support to IORING_OP_NOP
  io_uring: fix locking state for empty buffer group
  io_uring: implement multishot mode for accept
  io_uring: let fast poll support multishot
  io_uring: add REQ_F_APOLL_MULTISHOT for requests
  io_uring: add IORING_ACCEPT_MULTISHOT for accept
  io_uring: only wake when the correct events are set
  io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
  ...

3a166bdb

Merge tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 1e57930e

Linus Torvalds authored May 23, 2022

Pull RCU update from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes

 - Callback-offloading updates, mainly simplifications

 - RCU-tasks updates, including some -rt fixups, handling of systems
   with sparse CPU numbering, and a fix for a boot-time race-condition
   failure

 - Put SRCU on a memory diet in order to reduce the size of the
   srcu_struct structure

 - Torture-test updates fixing some bugs in tests and closing some
   testing holes

 - Torture-test updates for the RCU tasks flavors, most notably ensuring
   that building rcutorture and friends does not change the
   RCU-tasks-related Kconfig options

 - Torture-test scripting updates

 - Expedited grace-period updates, most notably providing
   milliseconds-scale (not all that) soft real-time response from
   synchronize_rcu_expedited().

   This is also the first time in almost 30 years of RCU that someone
   other than me has pushed for a reduction in the RCU CPU stall-warning
   timeout, in this case by more than three orders of magnitude from 21
   seconds to 20 milliseconds. This tighter timeout applies only to
   expedited grace periods

* tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
  rcu: Move expedited grace period (GP) work to RT kthread_worker
  rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
  srcu: Drop needless initialization of sdp in srcu_gp_start()
  srcu: Prevent expedited GPs and blocking readers from consuming CPU
  srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
  srcu: Automatically determine size-transition strategy at boot
  rcutorture: Make torture.sh allow for --kasan
  rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU
  rcutorture: Make kvm.sh allow more memory for --kasan runs
  torture: Save "make allmodconfig" .config file
  scftorture: Remove extraneous "scf" from per_version_boot_params
  rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC
  torture: Enable CSD-lock stall reports for scftorture
  torture: Skip vmlinux check for kvm-again.sh runs
  scftorture: Adjust for TASKS_RCU Kconfig option being selected
  rcuscale: Allow rcuscale without RCU Tasks Rude/Trace
  rcuscale: Allow rcuscale without RCU Tasks
  refscale: Allow refscale without RCU Tasks Rude/Trace
  refscale: Allow refscale without RCU Tasks
  rcutorture: Allow specifying per-scenario stat_interval
  ...

1e57930e

Merge tag 'lkmm.2022.05.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · b2f02e9c

Linus Torvalds authored May 23, 2022

Pull LKMM update from Paul McKenney:
 "This updates the klitmus7 compatibility table to indicate that
  herdtools7 7.56.1 or better is required for Linux kernel v5.17 or
  later"

* tag 'lkmm.2022.05.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  tools/memory-model/README: Update klitmus7 compat table

b2f02e9c

Merge tag 'nolibc.2022.05.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · f814957b

Linus Torvalds authored May 23, 2022

Pull nolibc library updates from Paul McKenney:
 "This adds a number of library functions and splits this library into
  multiple files"

* tag 'nolibc.2022.05.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (61 commits)
  tools/nolibc/string: Implement `strdup()` and `strndup()`
  tools/nolibc/string: Implement `strnlen()`
  tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
  tools/nolibc/types: Implement `offsetof()` and `container_of()` macro
  tools/nolibc/sys: Implement `mmap()` and `munmap()`
  tools/nolibc: i386: Implement syscall with 6 arguments
  tools/nolibc: Remove .global _start from the entry point code
  tools/nolibc: Replace `asm` with `__asm__`
  tools/nolibc: x86-64: Update System V ABI document link
  tools/nolibc/stdlib: only reference the external environ when inlined
  tools/nolibc/string: do not use __builtin_strlen() at -O0
  tools/nolibc: add the nolibc subdir to the common Makefile
  tools/nolibc: add a makefile to install headers
  tools/nolibc/types: add poll() and waitpid() flag definitions
  tools/nolibc/sys: add syscall definition for getppid()
  tools/nolibc/string: add strcmp() and strncmp()
  tools/nolibc/stdio: add support for '%p' to vfprintf()
  tools/nolibc/stdlib: add a simple getenv() implementation
  tools/nolibc/stdio: make printf(%s) accept NULL
  tools/nolibc/stdlib: implement abort()
  ...

f814957b

Merge tag 'efi-next-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · bf243102

Linus Torvalds authored May 23, 2022

Pull EFI updates from Ard Biesheuvel:

 - Allow runtime services to be re-enabled at boot on RT kernels.

 - Provide access to secrets injected into the boot image by CoCo
   hypervisors (COnfidential COmputing)

 - Use DXE services on x86 to make the boot image executable after
   relocation, if needed.

 - Prefer mirrored memory for randomized allocations.

 - Only randomize the placement of the kernel image on arm64 if the
   loader has not already done so.

 - Add support for obtaining the boot hartid from EFI on RISC-V.

* tag 'efi-next-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  riscv/efi_stub: Add support for RISCV_EFI_BOOT_PROTOCOL
  efi: stub: prefer mirrored memory for randomized allocations
  efi/arm64: libstub: run image in place if randomized by the loader
  efi: libstub: pass image handle to handle_kernel_image()
  efi: x86: Set the NX-compatibility flag in the PE header
  efi: libstub: ensure allocated memory to be executable
  efi: libstub: declare DXE services table
  efi: Add missing prototype for efi_capsule_setup_info
  docs: security: Add secrets/coco documentation
  efi: Register efi_secret platform device if EFI secret area is declared
  virt: Add efi_secret module to expose confidential computing secrets
  efi: Save location of EFI confidential computing area
  efi: Allow to enable EFI runtime services by default on RT

bf243102

22 May, 2022 4 commits

Linux 5.18 · 4b0986a3
Linus Torvalds authored May 22, 2022

4b0986a3

afs: Fix afs_getattr() to refetch file status if callback break occurred · 2aeb8c86

David Howells authored May 21, 2022

If a callback break occurs (change notification), afs_getattr() needs to
issue an FS.FetchStatus RPC operation to update the status of the file
being examined by the stat-family of system calls.

Fix afs_getattr() to do this if AFS_VNODE_CB_PROMISED has been cleared
on a vnode by a callback break. Skip this if AT_STATX_DONT_SYNC is set.

This can be tested by appending to a file on one AFS client and then
using "stat -L" to examine its length on a machine running kafs. This
can also be watched through tracing on the kafs machine. The callback
break is seen:

kworker/1:1-46 [001] ..... 978.910812: afs_cb_call: c=0000005f YFSCB.CallBack
kworker/1:1-46 [001] ...1. 978.910829: afs_cb_break: 100058:23b4c:242d2c2 b=2 s=1 break-cb
kworker/1:1-46 [001] ..... 978.911062: afs_call_done: c=0000005f ret=0 ab=0 [0000000082994ead]

And then the stat command generated no traffic if unpatched, but with
this change a call to fetch the status can be observed:

stat-4471 [000] ..... 986.744122: afs_make_fs_call: c=000000ab 100058:023b4c:242d2c2 YFS.FetchStatus
stat-4471 [000] ..... 986.745578: afs_call_done: c=000000ab ret=0 ab=0 [0000000087fc8c84]

Fixes: 08e0e7c8 ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
Tested-by: kafs-testing+fedora34_64checkkafs-build-496@auristor.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216010
Link: https://lore.kernel.org/r/165308359800.162686.14122417881564420962.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2aeb8c86

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 978df3e1

Linus Torvalds authored May 22, 2022

Pull i2c fixes from Wolfram Sang:
 "Some I2C driver bugfixes for 5.18. Nothing spectacular but worth
  fixing"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
  i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
  i2c: mt7621: fix missing clk_disable_unprepare() on error in mtk_i2c_probe()

978df3e1

Merge tag 'perf-tools-fixes-for-v5.18-2022-05-21' of... · eaea45fc

Linus Torvalds authored May 21, 2022

Merge tag 'perf-tools-fixes-for-v5.18-2022-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT events
   in 'perf stat'.

 - Fix x86's arch__intr_reg_mask() for the hybrid platform.

 - Address 'perf bench numa' compiler error on s390.

 - Fix check for btf__load_from_kernel_by_id() in libbpf.

 - Fix "all PMU test" 'perf test' to skip hv_24x7/hv_gpci tests on
   powerpc.

 - Fix session topology test to skip the test in guest environment.

 - Skip BPF 'perf test' if clang is not present.

 - Avoid shell test description infinite loop in 'perf test'.

 - Fix Intel LBR callstack entries and nr print message.

* tag 'perf-tools-fixes-for-v5.18-2022-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf session: Fix Intel LBR callstack entries and nr print message
  perf test bpf: Skip test if clang is not present
  perf test session topology: Fix test to skip the test in guest environment
  perf bench numa: Address compiler error on s390
  perf test: Avoid shell test description infinite loop
  perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform
  perf test: Fix "all PMU test" to skip hv_24x7/hv_gpci tests on powerpc
  perf stat: Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT events
  perf build: Fix check for btf__load_from_kernel_by_id() in libbpf

eaea45fc

21 May, 2022 16 commits

Merge tag 'input-for-v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 4c493b1a

Linus Torvalds authored May 21, 2022

Pull input fixes from Dmitry Torokhov:
 "A small fixup to ili210x touchscreen driver, and updated maintainer
  entry for the device tree binding of Mediatek 6779 keypad:

   - fix reset timing of Ilitek touchscreens

   - update maintainer entry of DT binding of Mediatek 6779 keypad"

* tag 'input-for-v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ili210x - use one common reset implementation
  Input: ili210x - fix reset timing
  dt-bindings: input: mediatek,mt6779-keypad: update maintainer

4c493b1a

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 36ed2da7

Linus Torvalds authored May 21, 2022

Pull SCSI fixes from James Bottomley:
 "Two patches, both in drivers.

  The iscsi one is fixing the cpumask issue you commented on and the ufs
  one is a late arriving fix for conditions that can occur in Host
  Performance Booster reads"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Fix referencing invalid rsp field
  scsi: target: Fix incorrect use of cpumask_t

36ed2da7

perf session: Fix Intel LBR callstack entries and nr print message · 51d0bf99

Chengdong Li authored May 17, 2022

When generating callstack information from branch_stack(Intel LBR), the
actual number of callstack entry should be bigger than the number of
branch_stack, for example:

	branch_stack records:
		B() -> C()
		A() -> B()
	converted callstack records should be:
		C()
		B()
		A()
though, the number of callstack equals
to the number of branch stack plus 1.

This patch fixes above issue in branch_stack__printf(). For example,

	# echo 'scale=2000; 4*a(1)' > cmd
	# perf record --call-graph lbr bc -l < cmd

Before applying this patch, `perf script -D` output:

	1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0
	... LBR call chain: nr:8
	.....  0: fffffffffffffe00
	.....  1: 000000000040a410
	.....  2: 000000000040573c
	.....  3: 0000000000408650
	.....  4: 00000000004022f2
	.....  5: 00000000004015f5
	.....  6: 00007f5ed6dcb553
	.....  7: 0000000000401698
	... FP chain: nr:2
	.....  0: fffffffffffffe00
	.....  1: 000000000040a6d8
	... branch callstack: nr:6    # which is not consistent with LBR records.
	.....  0: 000000000040a410
	.....  1: 0000000000408650    # ditto
	.....  2: 00000000004022f2
	.....  3: 00000000004015f5
	.....  4: 00007f5ed6dcb553
	.....  5: 0000000000401698
	 ... thread: bc:17990
	 ...... dso: /usr/bin/bc
	bc 17990 1220022.677386:     894172 cycles:
			  40a410 [unknown] (/usr/bin/bc)
			  40573c [unknown] (/usr/bin/bc)
			  408650 [unknown] (/usr/bin/bc)
			  4022f2 [unknown] (/usr/bin/bc)
			  4015f5 [unknown] (/usr/bin/bc)
		    7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so)
			  401698 [unknown] (/usr/bin/bc)

After applied:

	1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0
	... LBR call chain: nr:8
	.....  0: fffffffffffffe00
	.....  1: 000000000040a410
	.....  2: 000000000040573c
	.....  3: 0000000000408650
	.....  4: 00000000004022f2
	.....  5: 00000000004015f5
	.....  6: 00007f5ed6dcb553
	.....  7: 0000000000401698
	... FP chain: nr:2
	.....  0: fffffffffffffe00
	.....  1: 000000000040a6d8
	... branch callstack: nr:7
	.....  0: 000000000040a410
	.....  1: 000000000040573c
	.....  2: 0000000000408650
	.....  3: 00000000004022f2
	.....  4: 00000000004015f5
	.....  5: 00007f5ed6dcb553
	.....  6: 0000000000401698
	 ... thread: bc:17990
	 ...... dso: /usr/bin/bc
	bc 17990 1220022.677386:     894172 cycles:
			  40a410 [unknown] (/usr/bin/bc)
			  40573c [unknown] (/usr/bin/bc)
			  408650 [unknown] (/usr/bin/bc)
			  4022f2 [unknown] (/usr/bin/bc)
			  4015f5 [unknown] (/usr/bin/bc)
		    7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so)
			  401698 [unknown] (/usr/bin/bc)

Change from v1:
	- refined code style according to Jiri's review comments.
Signed-off-by: Chengdong Li <chengdongli@tencent.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: likexu@tencent.com
Link: https://lore.kernel.org/r/20220517015726.96131-1-chengdongli@tencent.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

51d0bf99

perf test bpf: Skip test if clang is not present · 8994e97b

Athira Rajeev authored May 11, 2022

Perf BPF filter test fails in environment where "clang" is not
installed.

Test failure logs:

<<>>
 42: BPF filter                    :
 42.1: Basic BPF filtering         : Skip
 42.2: BPF pinning                 : FAILED!
 42.3: BPF prologue generation     : FAILED!
<<>>

Enabling verbose option provided debug logs which says clang/llvm needs
to be installed. Snippet of verbose logs:

<<>>
 42.2: BPF pinning                  :
 --- start ---
test child forked, pid 61423
ERROR:	unable to find clang.
Hint:	Try to install latest clang/llvm to support BPF.
        Check your $PATH

<<logs_here>>

Failed to compile test case: 'Basic BPF llvm compile'
Unable to get BPF object, fix kbuild first
test child finished with -1
 ---- end ----
BPF filter subtest 2: FAILED!
<<>>

Here subtests, "BPF pinning" and "BPF prologue generation" failed and
logs shows clang/llvm is needed. After installing clang, testcase
passes.

Reason on why subtest failure happens though logs has proper debug
information:

Main function __test__bpf calls test_llvm__fetch_bpf_obj by
passing 4th argument as true ( 4th arguments maps to parameter
"force" in test_llvm__fetch_bpf_obj ). But this will cause
test_llvm__fetch_bpf_obj to skip the check for clang/llvm.

Snippet of code part which checks for clang based on
parameter "force" in test_llvm__fetch_bpf_obj:

<<>>
if (!force && (!llvm_param.user_set_param &&
<<>>

Since force is set to "false", test won't get skipped and fails to
compile test case. The BPF code compilation needs clang, So pass the
fourth argument as "false" and also skip the test if reason for return
is "TEST_SKIP"

After the patch:

<<>>
 42: BPF filter                    :
 42.1: Basic BPF filtering         : Skip
 42.2: BPF pinning                 : Skip
 42.3: BPF prologue generation     : Skip
<<>>

Fixes: ba1fae43 ("perf test: Add 'perf test BPF'")
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lore.kernel.org/r/20220511115438.84032-1-atrajeev@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8994e97b

perf test session topology: Fix test to skip the test in guest environment · cfd7092c

Athira Rajeev authored May 11, 2022

The session topology test fails in powerpc pSeries platform.

Test logs:

  <<>>
  Session topology : FAILED!
  <<>>

This testcases tests cpu topology by checking the core_id and socket_id
stored in perf_env from perf session. The data from perf session is
compared with the cpu topology information from
"/sys/devices/system/cpu/cpuX/topology" like core_id,
physical_package_id.

In case of virtual environment, detail like physical_package_id is
restricted to be exposed. Hence physical_package_id is set to -1. The
testcase fails on such platforms since socket_id can't be fetched from
topology info.

Skip the testcase in powerpc if physical_package_id returns -1.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>---
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220511114959.84002-1-atrajeev@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

cfd7092c

perf bench numa: Address compiler error on s390 · f8ac1c47

Thomas Richter authored May 20, 2022

The compilation on s390 results in this error:

  # make DEBUG=y bench/numa.o
  ...
  bench/numa.c: In function ‘__bench_numa’:
  bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated
              writing between 1 and 11 bytes into a region of size between
              10 and 20 [-Werror=format-truncation=]
  1749 |        snprintf(tname, sizeof(tname), "process%d:thread%d", p, t);
                                                               ^~
  ...
  bench/numa.c:1749:64: note: directive argument in the range
                 [-2147483647, 2147483646]
  ...
  #

The maximum length of the %d replacement is 11 characters because of the
negative sign.  Therefore extend the array by two more characters.

Output after:

  # make  DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o
  -rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o
  #

Fixes: 3aff8ba0 ("perf bench numa: Avoid possible truncation when using snprintf()")
Suggested-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

f8ac1c47

perf test: Avoid shell test description infinite loop · caaaa554

Ian Rogers authored May 17, 2022

for_each_shell_test() is already strict in expecting tests to be files
and executable. It is sometimes possible when it iterates over all files
that it finds one that is executable and lacks a newline character. When
this happens the loop never terminates as it doesn't check for EOF.

Add the EOF check to make this loop at least bounded by the file size.

If the description is returned as NULL then also skip the test.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220517204144.645913-1-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

caaaa554

perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform · 01b28e4a

Kan Liang authored May 18, 2022

The X86 specific arch__intr_reg_mask() is to check whether the kernel
and hardware can collect XMM registers. But it doesn't work on some
hybrid platform.

Without the patch on ADL-N:

  $ perf record -I?
  available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
  R11 R12 R13 R14 R15

The config of the test event doesn't contain the PMU information. The
kernel may fail to initialize it on the correct hybrid PMU and return
the wrong non-supported information.

Add the PMU information into the config for the hybrid platform. The
same register set is supported among different hybrid PMUs. Checking
the first available one is good enough.

With the patch on ADL-N:

  $ perf record -I?
  available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
  R11 R12 R13 R14 R15 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9
  XMM10 XMM11 XMM12 XMM13 XMM14 XMM15

Fixes: 6466ec14 ("perf regs x86: Add X86 specific arch__intr_reg_mask()")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518145125.1494156-1-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

01b28e4a

perf test: Fix "all PMU test" to skip hv_24x7/hv_gpci tests on powerpc · 451ed805

Athira Rajeev authored May 20, 2022

"perf all PMU test" picks the input events from "perf list --raw-dump
pmu" list and runs "perf stat -e" for each of the event in the list. In
case of powerpc, the PowerVM environment supports events from hv_24x7
and hv_gpci PMU which is of example format like below:

- hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
- hv_gpci/event,partition_id=?/

The value for "?" needs to be filled in depending on system and
respective event. CPM_ADJUNCT_INST needs have core value and domain
value. hv_gpci event needs partition_id.  Similarly, there are other
events for hv_24x7 and hv_gpci having "?" in event format. Hence skip
these events on powerpc platform since values like partition_id, domain
is specific to system and event.

Fixes: 3d5ac9ef ("perf test: Workload test of all PMUs")
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20220520101236.17249-1-atrajeev@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

451ed805

io_uring: cleanup handling of the two task_work lists · 3fe07bcd

Jens Axboe authored May 21, 2022

Rather than pass in a bool for whether or not this work item needs to go
into the priority list or not, provide separate helpers for it. For most
use cases, this also then gets rid of the branch for non-priority task
work.

While at it, rename the prior_task_list to prio_task_list. Prior is
a confusing name for it, as it would seem to indicate that this is the
previous task_work list. prio makes it clear that this is a priority
task_work list.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3fe07bcd

drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers · 03a35bc8

Piyush Malgujar authored May 11, 2022

Due to i2c->adap.dev.fwnode not being set, ACPI_COMPANION() wasn't properly
found for TWSI controllers.
Signed-off-by: Szymon Balcerak <sbalcerak@marvell.com>
Signed-off-by: Piyush Malgujar <pmalgujar@marvell.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

03a35bc8

i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging · 17a0f3ac

Mika Westerberg authored Apr 27, 2022

Before sending a MSI the hardware writes information pertinent to the
interrupt cause to a memory location pointed by SMTICL register. This
memory holds three double words where the least significant bit tells
whether the interrupt cause of master/target/error is valid. The driver
does not use this but we need to set it up because otherwise it will
perform DMA write to the default address (0) and this will cause an
IOMMU fault such as below:

  DMAR: DRHD: handling fault status reg 2
  DMAR: [DMA Write] Request device [00:12.0] PASID ffffffff fault addr 0
        [fault reason 05] PTE Write access is not set

To prevent this from happening, provide a proper DMA buffer for this
that then gets mapped by the IOMMU accordingly.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

17a0f3ac

i2c: mt7621: fix missing clk_disable_unprepare() on error in mtk_i2c_probe() · a2537c98

Yang Yingliang authored May 14, 2022

Fix the missing clk_disable_unprepare() before return
from mtk_i2c_probe() in the error handling case.

Fixes: d04913ec ("i2c: mt7621: Add MediaTek MT7621/7628/7688 I2C driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Stefan Roese <sr@denx.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

a2537c98

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 6c3f5bec

Linus Torvalds authored May 20, 2022

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Correctly expose GICv3 support even if no irqchip is created so
     that userspace doesn't observe it changing pointlessly (fixing a
     regression with QEMU)

   - Don't issue a hypercall to set the id-mapped vectors when protected
     mode is enabled (fix for pKVM in combination with CPUs affected by
     Spectre-v3a)

  x86 (five oneliners, of which the most interesting two are):

   - a NULL pointer dereference on INVPCID executed with paging
     disabled, but only if KVM is using shadow paging

   - an incorrect bsearch comparison function which could truncate the
     result and apply PMU event filtering incorrectly. This one comes
     with a selftests update too"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID
  KVM: x86: hyper-v: fix type of valid_bank_mask
  KVM: Free new dirty bitmap if creating a new memslot fails
  KVM: eventfd: Fix false positive RCU usage warning
  selftests: kvm/x86: Verify the pmu event filter matches the correct event
  selftests: kvm/x86: Add the helper function create_pmu_event_filter
  kvm: x86/pmu: Fix the compare function used by the pmu event filter
  KVM: arm64: Don't hypercall before EL2 init
  KVM: arm64: vgic-v3: Consistently populate ID_AA64PFR0_EL1.GIC
  KVM: x86/mmu: Update number of zapped pages even if page list is stable

6c3f5bec

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · b3454ce0

Linus Torvalds authored May 20, 2022

Pull clk fixes from Stephen Boyd:
 "Three clk driver fixes to close out the release

   - Fix a divider calculation breaking boot on Broadcom bcm2835

   - Fix HDMI output on Tanix TX6 mini board by reverting a patch

   - Fix clk_set_rate_range() calls on at91 by considering the range
     while calculating the divisor"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: at91: generated: consider range when calculating best rate
  Revert "clk: sunxi-ng: sun6i-rtc: Add support for H6"
  clk: bcm2835: fix bcm2835_clock_choose_div

b3454ce0

Merge tag 'drm-fixes-2022-05-21' of git://anongit.freedesktop.org/drm/drm · 93413c84

Linus Torvalds authored May 20, 2022

Pull drm fixes from Dave Airlie:
 "Few final fixes for 5.18, one amdgpu, core dp mst leak fix, dma-buf
  two fixes, and i915 has a few fixes, one for a regression on older
  GM45 chipsets,

  dma-buf:
   - ioctl userspace use fix
   - fix dma-buf sysfs name generation

  core:
   - dp/mst leak fix

  amdgpu:
   - suspend/resume regression fix

  i915:
   - fix for #5806: GPU hangs and display artifacts on Intel GM45
   - reject DMC with out-of-spec MMIO
   - correctly mark guilty contexts on GuC reset"

* tag 'drm-fixes-2022-05-21' of git://anongit.freedesktop.org/drm/drm:
  drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomap
  drm/amd: Don't reset dGPUs if the system is going to s2idle
  drm/dp/mst: fix a possible memory leak in fetch_monitor_name()
  dma-buf: fix use of DMA_BUF_SET_NAME_{A,B} in userspace
  i915/guc/reset: Make __guc_reset_context aware of guilty engines
  drm/i915/dmc: Add MMIO range restrictions
  dma-buf: ensure unique directory name for dmabuf stats

93413c84

20 May, 2022 11 commits

Merge tag 'drm-intel-fixes-2022-05-20' of... · 64eea680

Dave Airlie authored May 21, 2022

Merge tag 'drm-intel-fixes-2022-05-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- fix for #5806: GPU hangs and display artifacts on 5.18-rc3 on Intel GM45
- reject DMC with out-of-spec MMIO (Cc: stable)
- correctly mark guilty contexts on GuC reset.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YocqqvG6PbYx3QgJ@jlahtine-mobl.ger.corp.intel.com

64eea680

Merge tag 'drm-misc-fixes-2022-05-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 6e4a61cd

Dave Airlie authored May 21, 2022

Fix for a memory leak in dp_mst, a (userspace) build fix for
DMA_BUF_SET_NAME defines and a directory name generation fix for dmabuf
stats
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220520072408.cpjzy2taugagvrh7@houat

6e4a61cd

perf: Fix sys_perf_event_open() race against self · 3ac6487e

Peter Zijlstra authored May 20, 2022

Norbert reported that it's possible to race sys_perf_event_open() such
that the looser ends up in another context from the group leader,
triggering many WARNs.

The move_group case checks for races against itself, but the
!move_group case doesn't, seemingly relying on the previous
group_leader->ctx == ctx check. However, that check is racy due to not
holding any locks at that time.

Therefore, re-check the result after acquiring locks and bailing
if they no longer match.

Additionally, clarify the not_move_group case from the
move_group-vs-move_group race.

Fixes: f63a8daa ("perf: Fix event->ctx locking")
Reported-by: Norbert Slusarek <nslusarek@gmx.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3ac6487e

Merge tag 'gpio-fixes-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 3b5e1590

Linus Torvalds authored May 20, 2022

Pull gpio fixes from Bartosz Golaszewski:

 - fix bitops logic in gpio-vf610

 - return an error if the user tries to use inverted polarity in
   gpio-mvebu

* tag 'gpio-fixes-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: mvebu/pwm: Refuse requests with inverted polarity
  gpio: gpio-vf610: do not touch other bits when set the target bit

3b5e1590

Merge tag 'mmc-v5.18-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 317de3db

Linus Torvalds authored May 20, 2022

Pull MMC fix from Ulf Hansson:
 "MMC core:

   - Fix busy polling for MMC_SEND_OP_COND again"

* tag 'mmc-v5.18-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Fix busy polling for MMC_SEND_OP_COND again

317de3db

Merge tag 'ceph-for-5.18-rc8' of https://github.com/ceph/ceph-client · b851c1f8

Linus Torvalds authored May 20, 2022

Pull ceph fix from Ilya Dryomov:
 "A fix for a nasty use-after-free, marked for stable"

* tag 'ceph-for-5.18-rc8' of https://github.com/ceph/ceph-client:
  libceph: fix misleading ceph_osdc_cancel_request() comment
  libceph: fix potential use-after-free on linger ping and resends

b851c1f8

Merge tag 'riscv-for-linus-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 265f34c2

Linus Torvalds authored May 20, 2022

Pull RISC-V fixes from Palmer Dabbelt:

 - fix the fu540-c000 device tree to avoid a schema check failure on the
   DMA node name

 - fix typo in the PolarFire SOC device tree

* tag 'riscv-for-linus-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: dts: microchip: fix gpio1 reg property typo
  riscv: dts: sifive: fu540-c000: align dma node name with dtschema

265f34c2

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a956f4e2

Linus Torvalds authored May 20, 2022

Pull arm64 fixes from Will Deacon:
 "Three arm64 fixes for -rc8/final.

  The MTE and stolen time fixes have been doing the rounds for a little
  while, but review and testing feedback was ongoing until earlier this
  week. The kexec fix showed up on Monday and addresses a failure
  observed under Qemu.

  Summary:

   - Add missing write barrier to publish MTE tags before a pte update

   - Fix kexec relocation clobbering its own data structures

   - Fix stolen time crash if a timer IRQ fires during CPU hotplug"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mte: Ensure the cleared tags are visible before setting the PTE
  arm64: kexec: load from kimage prior to clobbering
  arm64: paravirt: Use RCU read locks to guard stolen_time

a956f4e2

KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID · 9f46c187

Paolo Bonzini authored May 20, 2022

With shadow paging enabled, the INVPCID instruction results in a call
to kvm_mmu_invpcid_gva.  If INVPCID is executed with CR0.PG=0, the
invlpg callback is not set and the result is a NULL pointer dereference.
Fix it trivially by checking for mmu->invlpg before every call.

There are other possibilities:

- check for CR0.PG, because KVM (like all Intel processors after P5)
  flushes guest TLB on CR0.PG changes so that INVPCID/INVLPG are a
  nop with paging disabled

- check for EFER.LMA, because KVM syncs and flushes when switching
  MMU contexts outside of 64-bit mode

All of these are tricky, go for the simple solution.  This is CVE-2022-1789.
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9f46c187

KVM: x86: hyper-v: fix type of valid_bank_mask · ea8c66fe

Yury Norov authored May 19, 2022

In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long,
but is used as u64, which is wrong for i386, and has been spotted by
LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with
hweight64()"

https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/

But it's wrong even without that patch because now bitmap_weight()
dereferences a word after valid_bank_mask on i386.

>> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type
+[-Wshift-count-overflow]
      21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
         |                                                                            ^~
   include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8'
      10 |          ((!!((w) & (1ULL << 0))) +     \
         |                ^
   include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16'
      20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16))
         |                               ^~~~~~~~~~~~~~~~~
   include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32'
      21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
         |                                                      ^~~~~~~~~~~~~~~~~
   include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64'
      29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w))
         |                                                 ^~~~~~~~~~~~~~~~~
   arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64'
    1983 |                 if (hc->var_cnt != hweight64(valid_bank_mask))
         |                                    ^~~~~~~~~

CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Jim Mattson <jmattson@google.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Sean Christopherson <seanjc@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Wanpeng Li <wanpengli@tencent.com>
CC: kvm@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: x86@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ea8c66fe

KVM: Free new dirty bitmap if creating a new memslot fails · c87661f8

Sean Christopherson authored May 18, 2022

Fix a goof in kvm_prepare_memory_region() where KVM fails to free the
new memslot's dirty bitmap during a CREATE action if
kvm_arch_prepare_memory_region() fails.  The logic is supposed to detect
if the bitmap was allocated and thus needs to be freed, versus if the
bitmap was inherited from the old memslot and thus needs to be kept.  If
there is no old memslot, then obviously the bitmap can't have been
inherited

The bug was exposed by commit 86931ff7 ("KVM: x86/mmu: Do not create
SPTEs for GFNs that exceed host.MAXPHYADDR"), which made it trivally easy
for syzkaller to trigger failure during kvm_arch_prepare_memory_region(),
but the bug can be hit other ways too, e.g. due to -ENOMEM when
allocating x86's memslot metadata.

The backtrace from kmemleak:

  __vmalloc_node_range+0xb40/0xbd0 mm/vmalloc.c:3195
  __vmalloc_node mm/vmalloc.c:3232 [inline]
  __vmalloc+0x49/0x50 mm/vmalloc.c:3246
  __vmalloc_array mm/util.c:671 [inline]
  __vcalloc+0x49/0x70 mm/util.c:694
  kvm_alloc_dirty_bitmap virt/kvm/kvm_main.c:1319
  kvm_prepare_memory_region virt/kvm/kvm_main.c:1551
  kvm_set_memslot+0x1bd/0x690 virt/kvm/kvm_main.c:1782
  __kvm_set_memory_region+0x689/0x750 virt/kvm/kvm_main.c:1949
  kvm_set_memory_region virt/kvm/kvm_main.c:1962
  kvm_vm_ioctl_set_memory_region virt/kvm/kvm_main.c:1974
  kvm_vm_ioctl+0x377/0x13a0 virt/kvm/kvm_main.c:4528
  vfs_ioctl fs/ioctl.c:51
  __do_sys_ioctl fs/ioctl.c:870
  __se_sys_ioctl fs/ioctl.c:856
  __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:856
  do_syscall_x64 arch/x86/entry/common.c:50
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

And the relevant sequence of KVM events:

  ioctl(3, KVM_CREATE_VM, 0)              = 4
  ioctl(4, KVM_SET_USER_MEMORY_REGION, {slot=0,
                                        flags=KVM_MEM_LOG_DIRTY_PAGES,
                                        guest_phys_addr=0x10000000000000,
                                        memory_size=4096,
                                        userspace_addr=0x20fe8000}
       ) = -1 EINVAL (Invalid argument)

Fixes: 244893fa ("KVM: Dynamically allocate "new" memslots from the get-go")
Cc: stable@vger.kernel.org
Reported-by: syzbot+8606b8a9cc97a63f1c87@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220518003842.1341782-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c87661f8