Commits · 39fee313fdd489480eb2c453535ab73672c39a27 · Kirill Smelkov / linux

09 Mar, 2024 2 commits

Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD · 39fee313

Paolo Bonzini authored Mar 09, 2024

KVM GUEST_MEMFD fixes for 6.8:

 - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating ABI that KVM can't sanely support.

 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely a development and testing vehicle, and
   come with zero guarantees.

 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.

 - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
   when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.

39fee313

Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEAD · 1b6c146d

Paolo Bonzini authored Mar 09, 2024

KVM x86 fixes for 6.8, round 2:

 - When emulating an atomic access, mark the gfn as dirty in the memslot
   to fix a bug where KVM could fail to mark the slot as dirty during live
   migration, ultimately resulting in guest data corruption due to a dirty
   page not being re-copied from the source to the target.

 - Check for mmu_notifier invalidation events before faulting in the pfn,
   and before acquiring mmu_lock, to avoid unnecessary work and lock
   contention.  Contending mmu_lock is especially problematic on preemptible
   kernels, as KVM may yield mmu_lock in response to the contention, which
   severely degrades overall performance due to vCPUs making it difficult
   for the task that triggered invalidation to make forward progress.

   Note, due to another kernel bug, this fix isn't limited to preemtible
   kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield
   contended rwlocks and spinlocks.

   https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com

1b6c146d

23 Feb, 2024 7 commits

KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing · d02c357e

Sean Christopherson authored Feb 21, 2024

Retry page faults without acquiring mmu_lock, and without even faulting
the page into the primary MMU, if the resolved gfn is covered by an active
invalidation. Contending for mmu_lock is especially problematic on
preemptible kernels as the mmu_notifier invalidation task will yield
mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
ultimately increase the latency of resolving the page fault. And in the
worst case scenario, yielding will be accompanied by a remote TLB flush,
e.g. if the invalidation covers a large range of memory and vCPUs are
accessing addresses that were already zapped.

Faulting the page into the primary MMU is similarly problematic, as doing
so may acquire locks that need to be taken for the invalidation to
complete (the primary MMU has finer grained locks than KVM's MMU), and/or
may cause unnecessary churn (getting/putting pages, marking them accessed,
etc).

Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
iterators to perform more work before yielding, but that wouldn't solve
the lock contention and would negatively affect scenarios where a vCPU is
trying to fault in an address that is NOT covered by the in-progress
invalidation.

Add a dedicated lockess version of the range-based retry check to avoid
false positives on the sanity check on start+end WARN, and so that it's
super obvious that checking for a racing invalidation without holding
mmu_lock is unsafe (though obviously useful).

Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
invalidation in a loop won't put KVM into an infinite loop, e.g. due to
caching the in-progress flag and never seeing it go to '0'.

Force a load of mmu_invalidate_seq as well, even though it isn't strictly
necessary to avoid an infinite loop, as doing so improves the probability
that KVM will detect an invalidation that already completed before
acquiring mmu_lock and bailing anyways.

Do the pre-check even for non-preemptible kernels, as waiting to detect
the invalidation until mmu_lock is held guarantees the vCPU will observe
the worst case latency in terms of handling the fault, and can generate
even more mmu_lock contention. E.g. the vCPU will acquire mmu_lock,
detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
eventually re-acquire mmu_lock. This behavior is also why there are no
new starvation issues due to losing the fairness guarantees provided by
rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
on mmu_lock doesn't guarantee forward progress in the face of _another_
mmu_notifier invalidation event.

Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
may generate a load into a register instead of doing a direct comparison
(MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
is a few bytes of code and maaaaybe a cycle or three.
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.comReported-by: Friedrich Weber <f.weber@proxmox.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Yuan Yao <yuan.yao@linux.intel.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20240222012640.2820927-1-seanjc@google.comSigned-off-by: Sean Christopherson <seanjc@google.com>

d02c357e

KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() · 5ef1d8c1

Sean Christopherson authored Feb 16, 2024

Do the cache flush of converted pages in svm_register_enc_region() before
dropping kvm->lock to fix use-after-free issues where region and/or its
array of pages could be freed by a different task, e.g. if userspace has
__unregister_enc_region_locked() already queued up for the region.

Note, the "obvious" alternative of using local variables doesn't fully
resolve the bug, as region->pages is also dynamically allocated.  I.e. the
region structure itself would be fine, but region->pages could be freed.

Flushing multiple pages under kvm->lock is unfortunate, but the entire
flow is a rare slow path, and the manual flush is only needed on CPUs that
lack coherency for encrypted memory.

Fixes: 19a23da5 ("Fix unsynchronized access to sev members through svm_register_enc_region")
Reported-by: Gabe Kirkpatrick <gkirkpatrick@google.com>
Cc: Josh Eads <josheads@google.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20240217013430.2079561-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5ef1d8c1

KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive · 2dfd2383

Sean Christopherson authored Feb 22, 2024

Extend set_memory_region_test's invalid flags subtest to verify that
GUEST_MEMFD is incompatible with READONLY. GUEST_MEMFD doesn't currently
support writes from userspace and KVM doesn't support emulated MMIO on
private accesses, and so KVM is supposed to reject the GUEST_MEMFD+READONLY
in order to avoid configuration that KVM can't support.

Link: https://lore.kernel.org/r/20240222190612.2942589-6-seanjc@google.comSigned-off-by: Sean Christopherson <seanjc@google.com>

2dfd2383

KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases · 63e5c5a1

Sean Christopherson authored Feb 22, 2024

Actually create a GUEST_MEMFD instance and pass it to KVM when doing
negative tests for KVM_SET_USER_MEMORY_REGION2 + KVM_MEM_GUEST_MEMFD.
Without a valid GUEST_MEMFD file descriptor, KVM_SET_USER_MEMORY_REGION2
will always fail with -EINVAL, resulting in false passes for any and all
tests of illegal combinations of KVM_MEM_GUEST_MEMFD and other flags.

Fixes: 5d743164 ("KVM: selftests: Add a memory region subtest to validate invalid flags")
Link: https://lore.kernel.org/r/20240222190612.2942589-5-seanjc@google.comSigned-off-by: Sean Christopherson <seanjc@google.com>

63e5c5a1

KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU · a1176ef5

Sean Christopherson authored Feb 22, 2024

Advertise and support software-protected VMs if and only if the TDP MMU is
enabled, i.e. disallow KVM_SW_PROTECTED_VM if TDP is enabled for KVM's
legacy/shadow MMU. TDP support for the shadow MMU is maintenance-only,
e.g. support for TDX and SNP will also be restricted to the TDP MMU.

Fixes: 89ea60c2 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
Link: https://lore.kernel.org/r/20240222190612.2942589-4-seanjc@google.comSigned-off-by: Sean Christopherson <seanjc@google.com>

a1176ef5

KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP · 42269209

Sean Christopherson authored Feb 22, 2024

Rewrite the help message for KVM_SW_PROTECTED_VM to make it clear that
software-protected VMs are a development and testing vehicle for
guest_memfd(), and that attempting to use KVM_SW_PROTECTED_VM for anything
remotely resembling a "real" VM will fail. E.g. any memory accesses from
KVM will incorrectly access shared memory, nested TDP is wildly broken,
and so on and so forth.

Update KVM's API documentation with similar warnings to discourage anyone
from attempting to run anything but selftests with KVM_X86_SW_PROTECTED_VM.

Fixes: 89ea60c2 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
Link: https://lore.kernel.org/r/20240222190612.2942589-3-seanjc@google.comSigned-off-by: Sean Christopherson <seanjc@google.com>

42269209

KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY · e5635922

Sean Christopherson authored Feb 22, 2024

Disallow creating read-only memslots that support GUEST_MEMFD, as
GUEST_MEMFD is fundamentally incompatible with KVM's semantics for
read-only memslots. Read-only memslots allow the userspace VMM to emulate
option ROMs by filling the backing memory with readable, executable code
and data, while triggering emulated MMIO on writes. GUEST_MEMFD doesn't
currently support writes from userspace and KVM doesn't support emulated
MMIO on private accesses, i.e. the guest can only ever read zeros, and
writes will always be treated as errors.

Cc: Fuad Tabba <tabba@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Isaku Yamahata <isaku.yamahata@gmail.com>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Fixes: a7800aa8 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Link: https://lore.kernel.org/r/20240222190612.2942589-2-seanjc@google.comSigned-off-by: Sean Christopherson <seanjc@google.com>

e5635922

21 Feb, 2024 3 commits

Merge tag 'kvmarm-fixes-6.8-3' of... · c48617fb

Paolo Bonzini authored Feb 21, 2024

Merge tag 'kvmarm-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.8, take #3

- Check for the validity of interrupts handled by a MOVALL
  command

- Check for the validity of interrupts while reading the
  pending state on enabling LPIs.

c48617fb

KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler · 85a71ee9

Oliver Upton authored Feb 21, 2024

It is possible that an LPI mapped in a different ITS gets unmapped while
handling the MOVALL command. If that is the case, there is no state that
can be migrated to the destination. Silently ignore it and continue
migrating other LPIs.

Cc: stable@vger.kernel.org
Fixes: ff9c1143 ("KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240221092732.4126848-3-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

85a71ee9

KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table() · 8d3a7dfb

Oliver Upton authored Feb 21, 2024

vgic_get_irq() may not return a valid descriptor if there is no ITS that
holds a valid translation for the specified INTID. If that is the case,
it is safe to silently ignore it and continue processing the LPI pending
table.

Cc: stable@vger.kernel.org
Fixes: 33d3bc95 ("KVM: arm64: vgic-its: Read initial LPI pending table")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240221092732.4126848-2-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

8d3a7dfb

17 Feb, 2024 1 commit

KVM: x86: Mark target gfn of emulated atomic instruction as dirty · 910c57df

Sean Christopherson authored Feb 14, 2024

When emulating an atomic access on behalf of the guest, mark the target
gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
fixes a bug where KVM effectively corrupts guest memory during live
migration by writing to guest memory without informing userspace that the
page is dirty.

Marking the page dirty got unintentionally dropped when KVM's emulated
CMPXCHG was converted to do a user access.  Before that, KVM explicitly
mapped the guest page into kernel memory, and marked the page dirty during
the unmap phase.

Mark the page dirty even if the CMPXCHG fails, as the old data is written
back on failure, i.e. the page is still written.  The value written is
guaranteed to be the same because the operation is atomic, but KVM's ABI
is that all writes are dirty logged regardless of the value written.  And
more importantly, that's what KVM did before the buggy commit.

Huge kudos to the folks on the Cc list (and many others), who did all the
actual work of triaging and debugging.

Fixes: 1c2361f6 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Pasha Tatashin <tatashin@google.com>
Cc: Michael Krebs <mkrebs@google.com>
base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20240215010004.1456078-2-seanjc@google.comSigned-off-by: Sean Christopherson <seanjc@google.com>

910c57df

16 Feb, 2024 2 commits

Merge tag 'kvmarm-fixes-6.8-2' of... · 9895ceeb

Paolo Bonzini authored Feb 16, 2024

Merge tag 'kvmarm-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.8, take #2

- Avoid dropping the page refcount twice when freeing an unlinked
  page-table subtree.

9895ceeb

Merge tag 'kvmarm-fixes-6.8-1' of... · 8046fa5f

Paolo Bonzini authored Feb 16, 2024

Merge tag 'kvmarm-fixes-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.8, take #1

- Don't source the VFIO Kconfig twice

- Fix protected-mode locking order between kvm and vcpus

8046fa5f

14 Feb, 2024 3 commits

Merge tag 'kvm-riscv-fixes-6.8-1' of https://github.com/kvm-riscv/linux into HEAD · e67391ca
Paolo Bonzini authored Feb 14, 2024
```
KVM/riscv fixes for 6.8, take #1

- Fix steal-time related sparse warnings
```
e67391ca

Merge tag 'kvm-x86-selftests-6.8-rcN' of https://github.com/kvm-x86/linux into HEAD · 2f8ebe43

Paolo Bonzini authored Feb 14, 2024

KVM selftests fixes/cleanups (and one KVM x86 cleanup) for 6.8:

 - Remove redundant newlines from error messages.

 - Delete an unused variable in the AMX test (which causes build failures when
   compiling with -Werror).

 - Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails with an
   error code other than ENOENT (a Hyper-V selftest bug resulted in an EMFILE,
   and the test eventually got skipped).

 - Fix TSC related bugs in several Hyper-V selftests.

 - Fix a bug in the dirty ring logging test where a sem_post() could be left
   pending across multiple runs, resulting in incorrect synchronization between
   the main thread and the vCPU worker thread.

 - Relax the dirty log split test's assertions on 4KiB mappings to fix false
   positives due to the number of mappings for memslot 0 (used for code and
   data that is NOT being dirty logged) changing, e.g. due to NUMA balancing.

 - Have KVM's gtod_is_based_on_tsc() return "bool" instead of an "int" (the
   function generates boolean values, and all callers treat the return value as
   a bool).

2f8ebe43

Merge tag 'kvm-x86-fixes-6.8-rcN' of https://github.com/kvm-x86/linux into HEAD · 22d0bc07

Paolo Bonzini authored Feb 14, 2024

KVM x86 fixes for 6.8:

 - Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if and only
   if the incoming events->nmi.pending is non-zero.  If the target vCPU is in
   the UNITIALIZED state, the spurious request will result in KVM exiting to
   userspace, which in turn causes QEMU to constantly acquire and release
   QEMU's global mutex, to the point where the BSP is unable to make forward
   progress.

 - Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl being
   incorrectly truncated, and ultimately causes KVM to think a fixed counter
   has already been disabled (KVM thinks the old value is '0').

 - Fix a stack leak in KVM_GET_MSRS where a failed MSR read from userspace
   that is ultimately ignored due to ignore_msrs=true doesn't zero the output
   as intended.

22d0bc07

13 Feb, 2024 1 commit

KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked() · c60d847b

Will Deacon authored Feb 12, 2024

kvm_pgtable_stage2_free_unlinked() does the final put_page() on the
root page of the sub-tree before returning, so remove the additional
put_page() invocations in the callers.

Cc: Ricardo Koller <ricarkol@google.com>
Fixes: f6a27d6d ("KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240212193052.27765-1-will@kernel.org

c60d847b

11 Feb, 2024 3 commits

Linux 6.8-rc4 · 841c3516
Linus Torvalds authored Feb 11, 2024

841c3516

Merge tag 'timers_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2766f59c

Linus Torvalds authored Feb 11, 2024

Pull timer fix from Borislav Petkov:

 - Make sure a warning is issued when a hrtimer gets queued after the
   timers have been migrated on the CPU down path and thus said timer
   will get ignored

* tag 'timers_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Report offline hrtimer enqueue

2766f59c

Merge tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c021e191

Linus Torvalds authored Feb 11, 2024

Pull x86 fixes from Borislav Petkov:

 - Correct the minimum CPU family for Transmeta Crusoe in Kconfig so
   that such hw can boot again

 - Do not take into accout XSTATE buffer size info supplied by userspace
   when constructing a sigreturn frame

 - Switch get_/put_user* to EX_TYPE_UACCESS exception handling when an
   MCE is encountered so that it can be properly recovered from instead
   of simply panicking

* tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
  x86/fpu: Stop relying on userspace for info to fault in xsave buffer
  x86/lib: Revert to _ASM_EXTABLE_UA() for {get,put}_user() fixups

c021e191

10 Feb, 2024 8 commits

Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of... · 7521f258

Linus Torvalds authored Feb 10, 2024

Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
  issues or aren't considered to be needed in earlier kernel versions"

* tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  nilfs2: fix potential bug in end_buffer_async_write
  mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
  nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
  MAINTAINERS: Leo Yan has moved
  mm/zswap: don't return LRU_SKIP if we have dropped lru lock
  fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
  mailmap: switch email address for John Moon
  mm: zswap: fix objcg use-after-free in entry destruction
  mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()
  arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
  selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros
  mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page
  mm: memcg: optimize parent iteration in memcg_rstat_updated()
  nilfs2: fix data corruption in dsync block recovery for small block sizes
  mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()
  exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
  fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
  fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
  getrusage: use sig->stats_lock rather than lock_task_sighand()
  getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
  ...

7521f258

Merge tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux · a5b6244c

Linus Torvalds authored Feb 10, 2024

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - Update a potentially stale firmware attribute (Maurizio)
     - Fixes for the recent verbose error logging (Keith, Chaitanya)
     - Protection information payload size fix for passthrough (Francis)

 - Fix for a queue freezing issue in virtblk (Yi)

 - blk-iocost underflow fix (Tejun)

 - blk-wbt task detection fix (Jan)

* tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux:
  virtio-blk: Ensure no requests in virtqueues before deleting vqs.
  blk-iocost: Fix an UBSAN shift-out-of-bounds warning
  nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
  nvme-core: fix comment to reflect right functions
  nvme: move passthrough logging attribute to head
  blk-wbt: Fix detection of dirty-throttled tasks
  nvme-host: fix the updating of the firmware version

a5b6244c

Merge tag 'firewire-fixes-6.8-rc4' of... · a38ff5bb

Linus Torvalds authored Feb 10, 2024

Merge tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
"A change to accelerate the device detection step in some cases.

In the self-identification step after bus-reset, all nodes in the same
bus broadcast selfID packet including the value of gap count. The
value is related to the cable hops between nodes, and used to
calculate the subaction gap and the arbitration reset gap.

When each node has the different value of the gap count, the
asynchronous communication between them is unreliable, since an
asynchronous transaction could be interrupted by another asynchronous
transaction before completion. The gap count inconsistency can be
resolved by several ways; e.g. the transfer of PHY configuration
packet and generation of bus-reset.

The current implementation of firewire stack can correctly detect the
gap count inconsistency, however the recovery action from the
inconsistency tends to be delayed after reading configuration ROM of
root node. This results in the long time to probe devices in some
combinations of hardware.

Here the stack is changed to schedule the action as soon as possible"

* tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: send bus reset promptly on gap count error

a38ff5bb

Merge tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 5a7ec870

Linus Torvalds authored Feb 10, 2024

Pull smb server fixes from Steve French:
 "Two ksmbd server fixes:

   - memory leak fix

   - a minor kernel-doc fix"

* tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
  ksmbd: Add kernel-doc for ksmbd_extract_sharename() function

5a7ec870

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4a7bbe75

Linus Torvalds authored Feb 09, 2024

Pull SCSI fixes from James Bottomley:
 "Three small driver fixes and one core fix.

  The core fix being a fixup to the one in the last pull request which
  didn't entirely move checking of scsi_host_busy() out from under the
  host lock"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare()
  scsi: ufs: core: Fix shift issue in ufshcd_clear_cmd()
  scsi: lpfc: Use unsigned type for num_sge
  scsi: core: Move scsi_host_busy() out of host lock if it is for per-command

4a7bbe75

Merge tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · ca00c700

Linus Torvalds authored Feb 09, 2024

Pull smb client fixes from Steve French:

 - reconnect fix

 - multichannel channel selection fix

 - minor mount warning fix

 - reparse point fix

 - null pointer check improvement

* tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: clarify mount warning
  cifs: handle cases where multiple sessions share connection
  cifs: change tcon status when need_reconnect is set on it
  smb: client: set correct d_type for reparse points under DFS mounts
  smb3: add missing null server pointer check

ca00c700

Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client · e1e3f530

Linus Torvalds authored Feb 09, 2024

Pull ceph fixes from Ilya Dryomov:
 "Some fscrypt-related fixups (sparse reads are used only for encrypted
  files) and two cap handling fixes from Xiubo and Rishabh"

* tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
  ceph: always check dir caps asynchronously
  ceph: prevent use-after-free in encode_cap_msg()
  ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
  libceph: just wait for more data to be available on the socket
  libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
  libceph: fail sparse-read if the data length doesn't match

e1e3f530

Merge tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3 · a2343df3

Linus Torvalds authored Feb 09, 2024

Pull ntfs3 fixes from Konstantin Komarov:
 "Fixed:
   - size update for compressed file
   - some logic errors, overflows
   - memory leak
   - some code was refactored

  Added:
   - implement super_operations::shutdown

  Improved:
   - alternative boot processing
   - reduced stack usage"

* tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits)
  fs/ntfs3: Slightly simplify ntfs_inode_printk()
  fs/ntfs3: Add ioctl operation for directories (FITRIM)
  fs/ntfs3: Fix oob in ntfs_listxattr
  fs/ntfs3: Fix an NULL dereference bug
  fs/ntfs3: Update inode->i_size after success write into compressed file
  fs/ntfs3: Fixed overflow check in mi_enum_attr()
  fs/ntfs3: Correct function is_rst_area_valid
  fs/ntfs3: Use i_size_read and i_size_write
  fs/ntfs3: Prevent generic message "attempt to access beyond end of device"
  fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache
  fs/ntfs3: Use kvfree to free memory allocated by kvmalloc
  fs/ntfs3: Disable ATTR_LIST_ENTRY size check
  fs/ntfs3: Fix c/mtime typo
  fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()
  fs/ntfs3: Add and fix comments
  fs/ntfs3: ntfs3_forced_shutdown use int instead of bool
  fs/ntfs3: Implement super_operations::shutdown
  fs/ntfs3: Drop suid and sgid bits as a part of fpunch
  fs/ntfs3: Add file_modified
  fs/ntfs3: Correct use bh_read
  ...

a2343df3

09 Feb, 2024 10 commits

work around gcc bugs with 'asm goto' with outputs · 4356e9f8

Linus Torvalds authored Feb 09, 2024

We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c3 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f18034 ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround.  But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

 (a) some versions of gcc don't mark the asm goto as 'volatile' when it
     has outputs:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

     which is easy to work around by just adding the 'volatile' by hand.

 (b) Internal compiler errors:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

     which are worked around by adding the extra empty 'asm' as a
     barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4356e9f8

smb3: clarify mount warning · a5cc98eb

Steve French authored Feb 06, 2024

When a user tries to use the "sec=krb5p" mount parameter to encrypt
data on connection to a server (when authenticating with Kerberos), we
indicate that it is not supported, but do not note the equivalent
recommended mount parameter ("sec=krb5,seal") which turns on encryption
for that mount (and uses Kerberos for auth). Update the warning message.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

a5cc98eb

cifs: handle cases where multiple sessions share connection · a39c757b

Shyam Prasad N authored Feb 06, 2024

Based on our implementation of multichannel, it is entirely
possible that a server struct may not be found in any channel
of an SMB session.

In such cases, we should be prepared to move on and search for
the server struct in the next session.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

a39c757b

cifs: change tcon status when need_reconnect is set on it · c6e02eef

Shyam Prasad N authored Feb 06, 2024

When a tcon is marked for need_reconnect, the intention
is to have it reconnected.

This change adjusts tcon->status in cifs_tree_connect
when need_reconnect is set. Also, this change has a minor
correction in resetting need_reconnect on success. It makes
sure that it is done with tc_lock held.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

c6e02eef

Merge tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 9ed18b0b

Linus Torvalds authored Feb 09, 2024

Pull RISC-V fixes from Palmer Dabbelt:

 - fix missing TLB flush during early boot on SPARSEMEM_VMEMMAP
   configurations

 - fixes to correctly implement the break-before-make behavior requried
   by the ISA for NAPOT mappings

 - fix a missing TLB flush on intermediate mapping changes

 - fix build warning about a missing declaration of overflow_stack

 - fix performace regression related to incorrect tracking of completed
   batch TLB flushes

* tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix arch_tlbbatch_flush() by clearing the batch cpumask
  riscv: declare overflow_stack as exported from traps.c
  riscv: Fix arch_hugetlb_migration_supported() for NAPOT
  riscv: Flush the tlb when a page directory is freed
  riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled
  riscv: Fix set_huge_pte_at() for NAPOT mapping
  riscv: mm: execute local TLB flush after populating vmemmap

9ed18b0b

Merge tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ca8a6673

Linus Torvalds authored Feb 09, 2024

Pull tracing fixes from Steven Rostedt:

- Fix broken direct trampolines being called when another callback is
attached the same function.

ARM 64 does not support FTRACE_WITH_REGS, and when it added direct
trampoline calls from ftrace, it removed the "WITH_REGS" flag from
the ftrace_ops for direct trampolines. This broke x86 as x86 requires
direct trampolines to have WITH_REGS.

This wasn't noticed because direct trampolines work as long as the
function it is attached to is not shared with other callbacks (like
the function tracer). When there are other callbacks, a helper
trampoline is called, to call all the non direct callbacks and when
it returns, the direct trampoline is called.

For x86, the direct trampoline sets a flag in the regs field to tell
the x86 specific code to call the direct trampoline. But this only
works if the ftrace_ops had WITH_REGS set. ARM does things
differently that does not require this. For now, set WITH_REGS if the
arch supports WITH_REGS (which ARM does not), and this makes it work
for both ARM64 and x86.

- Fix wasted memory in the saved_cmdlines logic.

The saved_cmdlines is a cache that maps PIDs to COMMs that tracing
can use. Most trace events only save the PID in the event. The
saved_cmdlines file lists PIDs to COMMs so that the tracing tools can
show an actual name and not just a PID for each event. There's an
array of PIDs that map to a small set of saved COMM strings. The
array is set to PID_MAX_DEFAULT which is usually set to 32768. When a
PID comes in, it will add itself to this array along with the index
into the COMM array (note if the system allows more than
PID_MAX_DEFAULT, this cache is similar to cache lines as an update of
a PID that has the same PID_MAX_DEFAULT bits set will flush out
another task with the same matching bits set).

A while ago, the size of this cache was changed to be dynamic and the
array was moved into a structure and created with kmalloc(). But this
new structure had the size of 131104 bytes, or 0x20020 in hex. As
kmalloc allocates in powers of two, it was actually allocating
0x40000 bytes (262144) leaving 131040 bytes of wasted memory. The
last element of this structure was a pointer to the COMM string array
which defaulted to just saving 128 COMMs.

By changing the last field of this structure to a variable length
string, and just having it round up to fill the allocated memory, the
default size of the saved COMM cache is now 8190. This not only uses
the wasted space, but actually saves space by removing the extra
allocation for the COMM names.

* tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix wasted memory in saved_cmdlines logic
ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default

ca8a6673

Merge tag 'probes-fixes-v6.8-rc3' of... · 6dc512a0

Linus Torvalds authored Feb 09, 2024

Merge tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - remove unnecessary initial values of kprobes local variables

 - probe-events parser bug fixes:

    - calculate the argument size and format string after setting type
      information from BTF, because BTF can change the size and format
      string.

    - show $comm parse error correctly instead of failing silently.

* tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  kprobes: Remove unnecessary initial values of variables
  tracing/probes: Fix to set arg size and fmt after setting type from BTF
  tracing/probes: Fix to show a parse error for bad type for $comm

6dc512a0

Merge tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · e6f39a90

Linus Torvalds authored Feb 09, 2024

Pull EFI fixes from Ard Biesheuvel:
 "The only notable change here is the patch that changes the way we deal
  with spurious errors from the EFI memory attribute protocol. This will
  be backported to v6.6, and is intended to ensure that we will not
  paint ourselves into a corner when we tighten this further in order to
  comply with MS requirements on signed EFI code.

  Note that this protocol does not currently exist in x86 production
  systems in the field, only in Microsoft's fork of OVMF, but it will be
  mandatory for Windows logo certification for x86 PCs in the future.

   - Tighten ELF relocation checks on the RISC-V EFI stub

   - Give up if the new EFI memory attributes protocol fails spuriously
     on x86

   - Take care not to place the kernel in the lowest 16 MB of DRAM on
     x86

   - Omit special purpose EFI memory from memblock

   - Some fixes for the CXL CPER reporting code

   - Make the PE/COFF layout of mixed-mode capable images comply with a
     strict interpretation of the spec"

* tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
  cxl/trace: Remove unnecessary memcpy's
  cxl/cper: Fix errant CPER prints for CXL events
  efi: Don't add memblocks for soft-reserved memory
  efi: runtime: Fix potential overflow of soft-reserved region size
  efi/libstub: Add one kernel-doc comment
  x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR
  x86/efistub: Give up if memory attribute protocol returns an error
  riscv/efistub: Tighten ELF relocation check
  riscv/efistub: Ensure GP-relative addressing is not used

e6f39a90

Merge tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 5ddfc246

Linus Torvalds authored Feb 09, 2024

Pull pci fixes from Bjorn Helgaas:

 - Fix an unintentional truncation of DWC MSI-X address to 32 bits and
   update similar MSI code to match (Dan Carpenter)

* tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: dwc: Clean up dw_pcie_ep_raise_msi_irq() alignment
  PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()

5ddfc246

Merge tag 'hwmon-for-v6.8-rc4' of... · 5ca243c2

Linus Torvalds authored Feb 09, 2024

Merge tag 'hwmon-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - coretemp: Various fixes, and increase number of supported CPU cores

 - aspeed-pwm-tacho: Add missing mutex protection

* tag 'hwmon-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (coretemp) Enlarge per package core count limit
  hwmon: (coretemp) Fix bogus core_id to attr name mapping
  hwmon: (coretemp) Fix out-of-bounds memory access
  hwmon: (aspeed-pwm-tacho) mutex for tach reading

5ca243c2