1. 09 Mar, 2024 3 commits
    • Paolo Bonzini's avatar
      SEV: disable SEV-ES DebugSwap by default · 5abf6dce
      Paolo Bonzini authored
      The DebugSwap feature of SEV-ES provides a way for confidential guests to use
      data breakpoints.  However, because the status of the DebugSwap feature is
      recorded in the VMSA, enabling it by default invalidates the attestation
      signatures.  In 6.10 we will introduce a new API to create SEV VMs that
      will allow enabling DebugSwap based on what the user tells KVM to do.
      Contextually, we will change the legacy KVM_SEV_ES_INIT API to never
      enable DebugSwap.
      
      For compatibility with kernels that pre-date the introduction of DebugSwap,
      as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable
      the feature by default.  If anybody wants to use it, for now they can enable
      the sev_es_debug_swap_enabled module parameter, but this will result in a
      warning.
      
      Fixes: d1f85fbe ("KVM: SEV: Enable data breakpoints in SEV-ES")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5abf6dce
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD · 39fee313
      Paolo Bonzini authored
      KVM GUEST_MEMFD fixes for 6.8:
      
       - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
         avoid creating ABI that KVM can't sanely support.
      
       - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
         clear that such VMs are purely a development and testing vehicle, and
         come with zero guarantees.
      
       - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
         is to support confidential VMs with deterministic private memory (SNP
         and TDX) only in the TDP MMU.
      
       - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
         when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
      39fee313
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEAD · 1b6c146d
      Paolo Bonzini authored
      KVM x86 fixes for 6.8, round 2:
      
       - When emulating an atomic access, mark the gfn as dirty in the memslot
         to fix a bug where KVM could fail to mark the slot as dirty during live
         migration, ultimately resulting in guest data corruption due to a dirty
         page not being re-copied from the source to the target.
      
       - Check for mmu_notifier invalidation events before faulting in the pfn,
         and before acquiring mmu_lock, to avoid unnecessary work and lock
         contention.  Contending mmu_lock is especially problematic on preemptible
         kernels, as KVM may yield mmu_lock in response to the contention, which
         severely degrades overall performance due to vCPUs making it difficult
         for the task that triggered invalidation to make forward progress.
      
         Note, due to another kernel bug, this fix isn't limited to preemtible
         kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield
         contended rwlocks and spinlocks.
      
         https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com
      1b6c146d
  2. 23 Feb, 2024 7 commits
    • Sean Christopherson's avatar
      KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing · d02c357e
      Sean Christopherson authored
      Retry page faults without acquiring mmu_lock, and without even faulting
      the page into the primary MMU, if the resolved gfn is covered by an active
      invalidation.  Contending for mmu_lock is especially problematic on
      preemptible kernels as the mmu_notifier invalidation task will yield
      mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
      ultimately increase the latency of resolving the page fault.  And in the
      worst case scenario, yielding will be accompanied by a remote TLB flush,
      e.g. if the invalidation covers a large range of memory and vCPUs are
      accessing addresses that were already zapped.
      
      Faulting the page into the primary MMU is similarly problematic, as doing
      so may acquire locks that need to be taken for the invalidation to
      complete (the primary MMU has finer grained locks than KVM's MMU), and/or
      may cause unnecessary churn (getting/putting pages, marking them accessed,
      etc).
      
      Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
      iterators to perform more work before yielding, but that wouldn't solve
      the lock contention and would negatively affect scenarios where a vCPU is
      trying to fault in an address that is NOT covered by the in-progress
      invalidation.
      
      Add a dedicated lockess version of the range-based retry check to avoid
      false positives on the sanity check on start+end WARN, and so that it's
      super obvious that checking for a racing invalidation without holding
      mmu_lock is unsafe (though obviously useful).
      
      Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
      invalidation in a loop won't put KVM into an infinite loop, e.g. due to
      caching the in-progress flag and never seeing it go to '0'.
      
      Force a load of mmu_invalidate_seq as well, even though it isn't strictly
      necessary to avoid an infinite loop, as doing so improves the probability
      that KVM will detect an invalidation that already completed before
      acquiring mmu_lock and bailing anyways.
      
      Do the pre-check even for non-preemptible kernels, as waiting to detect
      the invalidation until mmu_lock is held guarantees the vCPU will observe
      the worst case latency in terms of handling the fault, and can generate
      even more mmu_lock contention.  E.g. the vCPU will acquire mmu_lock,
      detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
      eventually re-acquire mmu_lock.  This behavior is also why there are no
      new starvation issues due to losing the fairness guarantees provided by
      rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
      on mmu_lock doesn't guarantee forward progress in the face of _another_
      mmu_notifier invalidation event.
      
      Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
      may generate a load into a register instead of doing a direct comparison
      (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
      is a few bytes of code and maaaaybe a cycle or three.
      Reported-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.comReported-by: default avatarFriedrich Weber <f.weber@proxmox.com>
      Cc: Kai Huang <kai.huang@intel.com>
      Cc: Yan Zhao <yan.y.zhao@intel.com>
      Cc: Yuan Yao <yuan.yao@linux.intel.com>
      Cc: Xu Yilun <yilun.xu@linux.intel.com>
      Acked-by: default avatarKai Huang <kai.huang@intel.com>
      Reviewed-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Link: https://lore.kernel.org/r/20240222012640.2820927-1-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      d02c357e
    • Sean Christopherson's avatar
      KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() · 5ef1d8c1
      Sean Christopherson authored
      Do the cache flush of converted pages in svm_register_enc_region() before
      dropping kvm->lock to fix use-after-free issues where region and/or its
      array of pages could be freed by a different task, e.g. if userspace has
      __unregister_enc_region_locked() already queued up for the region.
      
      Note, the "obvious" alternative of using local variables doesn't fully
      resolve the bug, as region->pages is also dynamically allocated.  I.e. the
      region structure itself would be fine, but region->pages could be freed.
      
      Flushing multiple pages under kvm->lock is unfortunate, but the entire
      flow is a rare slow path, and the manual flush is only needed on CPUs that
      lack coherency for encrypted memory.
      
      Fixes: 19a23da5 ("Fix unsynchronized access to sev members through svm_register_enc_region")
      Reported-by: default avatarGabe Kirkpatrick <gkirkpatrick@google.com>
      Cc: Josh Eads <josheads@google.com>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20240217013430.2079561-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5ef1d8c1
    • Sean Christopherson's avatar
      KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive · 2dfd2383
      Sean Christopherson authored
      Extend set_memory_region_test's invalid flags subtest to verify that
      GUEST_MEMFD is incompatible with READONLY.  GUEST_MEMFD doesn't currently
      support writes from userspace and KVM doesn't support emulated MMIO on
      private accesses, and so KVM is supposed to reject the GUEST_MEMFD+READONLY
      in order to avoid configuration that KVM can't support.
      
      Link: https://lore.kernel.org/r/20240222190612.2942589-6-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      2dfd2383
    • Sean Christopherson's avatar
      KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases · 63e5c5a1
      Sean Christopherson authored
      Actually create a GUEST_MEMFD instance and pass it to KVM when doing
      negative tests for KVM_SET_USER_MEMORY_REGION2 + KVM_MEM_GUEST_MEMFD.
      Without a valid GUEST_MEMFD file descriptor, KVM_SET_USER_MEMORY_REGION2
      will always fail with -EINVAL, resulting in false passes for any and all
      tests of illegal combinations of KVM_MEM_GUEST_MEMFD and other flags.
      
      Fixes: 5d743164 ("KVM: selftests: Add a memory region subtest to validate invalid flags")
      Link: https://lore.kernel.org/r/20240222190612.2942589-5-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      63e5c5a1
    • Sean Christopherson's avatar
      KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU · a1176ef5
      Sean Christopherson authored
      Advertise and support software-protected VMs if and only if the TDP MMU is
      enabled, i.e. disallow KVM_SW_PROTECTED_VM if TDP is enabled for KVM's
      legacy/shadow MMU.  TDP support for the shadow MMU is maintenance-only,
      e.g. support for TDX and SNP will also be restricted to the TDP MMU.
      
      Fixes: 89ea60c2 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
      Link: https://lore.kernel.org/r/20240222190612.2942589-4-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      a1176ef5
    • Sean Christopherson's avatar
      KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP · 42269209
      Sean Christopherson authored
      Rewrite the help message for KVM_SW_PROTECTED_VM to make it clear that
      software-protected VMs are a development and testing vehicle for
      guest_memfd(), and that attempting to use KVM_SW_PROTECTED_VM for anything
      remotely resembling a "real" VM will fail.  E.g. any memory accesses from
      KVM will incorrectly access shared memory, nested TDP is wildly broken,
      and so on and so forth.
      
      Update KVM's API documentation with similar warnings to discourage anyone
      from attempting to run anything but selftests with KVM_X86_SW_PROTECTED_VM.
      
      Fixes: 89ea60c2 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
      Link: https://lore.kernel.org/r/20240222190612.2942589-3-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      42269209
    • Sean Christopherson's avatar
      KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY · e5635922
      Sean Christopherson authored
      Disallow creating read-only memslots that support GUEST_MEMFD, as
      GUEST_MEMFD is fundamentally incompatible with KVM's semantics for
      read-only memslots.  Read-only memslots allow the userspace VMM to emulate
      option ROMs by filling the backing memory with readable, executable code
      and data, while triggering emulated MMIO on writes.  GUEST_MEMFD doesn't
      currently support writes from userspace and KVM doesn't support emulated
      MMIO on private accesses, i.e. the guest can only ever read zeros, and
      writes will always be treated as errors.
      
      Cc: Fuad Tabba <tabba@google.com>
      Cc: Michael Roth <michael.roth@amd.com>
      Cc: Isaku Yamahata <isaku.yamahata@gmail.com>
      Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
      Cc: Chao Peng <chao.p.peng@linux.intel.com>
      Fixes: a7800aa8 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
      Link: https://lore.kernel.org/r/20240222190612.2942589-2-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      e5635922
  3. 21 Feb, 2024 3 commits
  4. 17 Feb, 2024 1 commit
    • Sean Christopherson's avatar
      KVM: x86: Mark target gfn of emulated atomic instruction as dirty · 910c57df
      Sean Christopherson authored
      When emulating an atomic access on behalf of the guest, mark the target
      gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
      fixes a bug where KVM effectively corrupts guest memory during live
      migration by writing to guest memory without informing userspace that the
      page is dirty.
      
      Marking the page dirty got unintentionally dropped when KVM's emulated
      CMPXCHG was converted to do a user access.  Before that, KVM explicitly
      mapped the guest page into kernel memory, and marked the page dirty during
      the unmap phase.
      
      Mark the page dirty even if the CMPXCHG fails, as the old data is written
      back on failure, i.e. the page is still written.  The value written is
      guaranteed to be the same because the operation is atomic, but KVM's ABI
      is that all writes are dirty logged regardless of the value written.  And
      more importantly, that's what KVM did before the buggy commit.
      
      Huge kudos to the folks on the Cc list (and many others), who did all the
      actual work of triaging and debugging.
      
      Fixes: 1c2361f6 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
      Cc: stable@vger.kernel.org
      Cc: David Matlack <dmatlack@google.com>
      Cc: Pasha Tatashin <tatashin@google.com>
      Cc: Michael Krebs <mkrebs@google.com>
      base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Link: https://lore.kernel.org/r/20240215010004.1456078-2-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      910c57df
  5. 16 Feb, 2024 2 commits
  6. 14 Feb, 2024 3 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-fixes-6.8-1' of https://github.com/kvm-riscv/linux into HEAD · e67391ca
      Paolo Bonzini authored
      KVM/riscv fixes for 6.8, take #1
      
      - Fix steal-time related sparse warnings
      e67391ca
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-selftests-6.8-rcN' of https://github.com/kvm-x86/linux into HEAD · 2f8ebe43
      Paolo Bonzini authored
      KVM selftests fixes/cleanups (and one KVM x86 cleanup) for 6.8:
      
       - Remove redundant newlines from error messages.
      
       - Delete an unused variable in the AMX test (which causes build failures when
         compiling with -Werror).
      
       - Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails with an
         error code other than ENOENT (a Hyper-V selftest bug resulted in an EMFILE,
         and the test eventually got skipped).
      
       - Fix TSC related bugs in several Hyper-V selftests.
      
       - Fix a bug in the dirty ring logging test where a sem_post() could be left
         pending across multiple runs, resulting in incorrect synchronization between
         the main thread and the vCPU worker thread.
      
       - Relax the dirty log split test's assertions on 4KiB mappings to fix false
         positives due to the number of mappings for memslot 0 (used for code and
         data that is NOT being dirty logged) changing, e.g. due to NUMA balancing.
      
       - Have KVM's gtod_is_based_on_tsc() return "bool" instead of an "int" (the
         function generates boolean values, and all callers treat the return value as
         a bool).
      2f8ebe43
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-fixes-6.8-rcN' of https://github.com/kvm-x86/linux into HEAD · 22d0bc07
      Paolo Bonzini authored
      KVM x86 fixes for 6.8:
      
       - Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if and only
         if the incoming events->nmi.pending is non-zero.  If the target vCPU is in
         the UNITIALIZED state, the spurious request will result in KVM exiting to
         userspace, which in turn causes QEMU to constantly acquire and release
         QEMU's global mutex, to the point where the BSP is unable to make forward
         progress.
      
       - Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl being
         incorrectly truncated, and ultimately causes KVM to think a fixed counter
         has already been disabled (KVM thinks the old value is '0').
      
       - Fix a stack leak in KVM_GET_MSRS where a failed MSR read from userspace
         that is ultimately ignored due to ignore_msrs=true doesn't zero the output
         as intended.
      22d0bc07
  7. 13 Feb, 2024 1 commit
  8. 11 Feb, 2024 3 commits
  9. 10 Feb, 2024 8 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of... · 7521f258
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
        issues or aren't considered to be needed in earlier kernel versions"
      
      * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
        nilfs2: fix potential bug in end_buffer_async_write
        mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
        nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
        MAINTAINERS: Leo Yan has moved
        mm/zswap: don't return LRU_SKIP if we have dropped lru lock
        fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
        mailmap: switch email address for John Moon
        mm: zswap: fix objcg use-after-free in entry destruction
        mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()
        arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
        selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros
        mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page
        mm: memcg: optimize parent iteration in memcg_rstat_updated()
        nilfs2: fix data corruption in dsync block recovery for small block sizes
        mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()
        exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
        fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
        fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
        getrusage: use sig->stats_lock rather than lock_task_sighand()
        getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
        ...
      7521f258
    • Linus Torvalds's avatar
      Merge tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux · a5b6244c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Keith:
           - Update a potentially stale firmware attribute (Maurizio)
           - Fixes for the recent verbose error logging (Keith, Chaitanya)
           - Protection information payload size fix for passthrough (Francis)
      
       - Fix for a queue freezing issue in virtblk (Yi)
      
       - blk-iocost underflow fix (Tejun)
      
       - blk-wbt task detection fix (Jan)
      
      * tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux:
        virtio-blk: Ensure no requests in virtqueues before deleting vqs.
        blk-iocost: Fix an UBSAN shift-out-of-bounds warning
        nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
        nvme-core: fix comment to reflect right functions
        nvme: move passthrough logging attribute to head
        blk-wbt: Fix detection of dirty-throttled tasks
        nvme-host: fix the updating of the firmware version
      a5b6244c
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes-6.8-rc4' of... · a38ff5bb
      Linus Torvalds authored
      Merge tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
      
      Pull firewire fix from Takashi Sakamoto:
       "A change to accelerate the device detection step in some cases.
      
        In the self-identification step after bus-reset, all nodes in the same
        bus broadcast selfID packet including the value of gap count. The
        value is related to the cable hops between nodes, and used to
        calculate the subaction gap and the arbitration reset gap.
      
        When each node has the different value of the gap count, the
        asynchronous communication between them is unreliable, since an
        asynchronous transaction could be interrupted by another asynchronous
        transaction before completion. The gap count inconsistency can be
        resolved by several ways; e.g. the transfer of PHY configuration
        packet and generation of bus-reset.
      
        The current implementation of firewire stack can correctly detect the
        gap count inconsistency, however the recovery action from the
        inconsistency tends to be delayed after reading configuration ROM of
        root node. This results in the long time to probe devices in some
        combinations of hardware.
      
        Here the stack is changed to schedule the action as soon as possible"
      
      * tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: core: send bus reset promptly on gap count error
      a38ff5bb
    • Linus Torvalds's avatar
      Merge tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 5a7ec870
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Two ksmbd server fixes:
      
         - memory leak fix
      
         - a minor kernel-doc fix"
      
      * tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
        ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
      5a7ec870
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4a7bbe75
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three small driver fixes and one core fix.
      
        The core fix being a fixup to the one in the last pull request which
        didn't entirely move checking of scsi_host_busy() out from under the
        host lock"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare()
        scsi: ufs: core: Fix shift issue in ufshcd_clear_cmd()
        scsi: lpfc: Use unsigned type for num_sge
        scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
      4a7bbe75
    • Linus Torvalds's avatar
      Merge tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · ca00c700
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - reconnect fix
      
       - multichannel channel selection fix
      
       - minor mount warning fix
      
       - reparse point fix
      
       - null pointer check improvement
      
      * tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: clarify mount warning
        cifs: handle cases where multiple sessions share connection
        cifs: change tcon status when need_reconnect is set on it
        smb: client: set correct d_type for reparse points under DFS mounts
        smb3: add missing null server pointer check
      ca00c700
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client · e1e3f530
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Some fscrypt-related fixups (sparse reads are used only for encrypted
        files) and two cap handling fixes from Xiubo and Rishabh"
      
      * tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
        ceph: always check dir caps asynchronously
        ceph: prevent use-after-free in encode_cap_msg()
        ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
        libceph: just wait for more data to be available on the socket
        libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
        libceph: fail sparse-read if the data length doesn't match
      e1e3f530
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3 · a2343df3
      Linus Torvalds authored
      Pull ntfs3 fixes from Konstantin Komarov:
       "Fixed:
         - size update for compressed file
         - some logic errors, overflows
         - memory leak
         - some code was refactored
      
        Added:
         - implement super_operations::shutdown
      
        Improved:
         - alternative boot processing
         - reduced stack usage"
      
      * tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits)
        fs/ntfs3: Slightly simplify ntfs_inode_printk()
        fs/ntfs3: Add ioctl operation for directories (FITRIM)
        fs/ntfs3: Fix oob in ntfs_listxattr
        fs/ntfs3: Fix an NULL dereference bug
        fs/ntfs3: Update inode->i_size after success write into compressed file
        fs/ntfs3: Fixed overflow check in mi_enum_attr()
        fs/ntfs3: Correct function is_rst_area_valid
        fs/ntfs3: Use i_size_read and i_size_write
        fs/ntfs3: Prevent generic message "attempt to access beyond end of device"
        fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache
        fs/ntfs3: Use kvfree to free memory allocated by kvmalloc
        fs/ntfs3: Disable ATTR_LIST_ENTRY size check
        fs/ntfs3: Fix c/mtime typo
        fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()
        fs/ntfs3: Add and fix comments
        fs/ntfs3: ntfs3_forced_shutdown use int instead of bool
        fs/ntfs3: Implement super_operations::shutdown
        fs/ntfs3: Drop suid and sgid bits as a part of fpunch
        fs/ntfs3: Add file_modified
        fs/ntfs3: Correct use bh_read
        ...
      a2343df3
  10. 09 Feb, 2024 9 commits
    • Linus Torvalds's avatar
      work around gcc bugs with 'asm goto' with outputs · 4356e9f8
      Linus Torvalds authored
      We've had issues with gcc and 'asm goto' before, and we created a
      'asm_volatile_goto()' macro for that in the past: see commits
      3f0116c3 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
      bug") and a9f18034 ("compiler/gcc4: Make quirk for
      asm_volatile_goto() unconditional").
      
      Then, much later, we ended up removing the workaround in commit
      43c249ea ("compiler-gcc.h: remove ancient workaround for gcc PR
      58670") because we no longer supported building the kernel with the
      affected gcc versions, but we left the macro uses around.
      
      Now, Sean Christopherson reports a new version of a very similar
      problem, which is fixed by re-applying that ancient workaround.  But the
      problem in question is limited to only the 'asm goto with outputs'
      cases, so instead of re-introducing the old workaround as-is, let's
      rename and limit the workaround to just that much less common case.
      
      It looks like there are at least two separate issues that all hit in
      this area:
      
       (a) some versions of gcc don't mark the asm goto as 'volatile' when it
           has outputs:
      
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
      
           which is easy to work around by just adding the 'volatile' by hand.
      
       (b) Internal compiler errors:
      
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
      
           which are worked around by adding the extra empty 'asm' as a
           barrier, as in the original workaround.
      
      but the problem Sean sees may be a third thing since it involves bad
      code generation (not an ICE) even with the manually added 'volatile'.
      
      but the same old workaround works for this case, even if this feels a
      bit like voodoo programming and may only be hiding the issue.
      Reported-and-tested-by: default avatarSean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Andrew Pinski <quic_apinski@quicinc.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4356e9f8
    • Steve French's avatar
      smb3: clarify mount warning · a5cc98eb
      Steve French authored
      When a user tries to use the "sec=krb5p" mount parameter to encrypt
      data on connection to a server (when authenticating with Kerberos), we
      indicate that it is not supported, but do not note the equivalent
      recommended mount parameter ("sec=krb5,seal") which turns on encryption
      for that mount (and uses Kerberos for auth).  Update the warning message.
      Reviewed-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a5cc98eb
    • Shyam Prasad N's avatar
      cifs: handle cases where multiple sessions share connection · a39c757b
      Shyam Prasad N authored
      Based on our implementation of multichannel, it is entirely
      possible that a server struct may not be found in any channel
      of an SMB session.
      
      In such cases, we should be prepared to move on and search for
      the server struct in the next session.
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a39c757b
    • Shyam Prasad N's avatar
      cifs: change tcon status when need_reconnect is set on it · c6e02eef
      Shyam Prasad N authored
      When a tcon is marked for need_reconnect, the intention
      is to have it reconnected.
      
      This change adjusts tcon->status in cifs_tree_connect
      when need_reconnect is set. Also, this change has a minor
      correction in resetting need_reconnect on success. It makes
      sure that it is done with tc_lock held.
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      c6e02eef
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 9ed18b0b
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - fix missing TLB flush during early boot on SPARSEMEM_VMEMMAP
         configurations
      
       - fixes to correctly implement the break-before-make behavior requried
         by the ISA for NAPOT mappings
      
       - fix a missing TLB flush on intermediate mapping changes
      
       - fix build warning about a missing declaration of overflow_stack
      
       - fix performace regression related to incorrect tracking of completed
         batch TLB flushes
      
      * tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix arch_tlbbatch_flush() by clearing the batch cpumask
        riscv: declare overflow_stack as exported from traps.c
        riscv: Fix arch_hugetlb_migration_supported() for NAPOT
        riscv: Flush the tlb when a page directory is freed
        riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled
        riscv: Fix set_huge_pte_at() for NAPOT mapping
        riscv: mm: execute local TLB flush after populating vmemmap
      9ed18b0b
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ca8a6673
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix broken direct trampolines being called when another callback is
         attached the same function.
      
         ARM 64 does not support FTRACE_WITH_REGS, and when it added direct
         trampoline calls from ftrace, it removed the "WITH_REGS" flag from
         the ftrace_ops for direct trampolines. This broke x86 as x86 requires
         direct trampolines to have WITH_REGS.
      
         This wasn't noticed because direct trampolines work as long as the
         function it is attached to is not shared with other callbacks (like
         the function tracer). When there are other callbacks, a helper
         trampoline is called, to call all the non direct callbacks and when
         it returns, the direct trampoline is called.
      
         For x86, the direct trampoline sets a flag in the regs field to tell
         the x86 specific code to call the direct trampoline. But this only
         works if the ftrace_ops had WITH_REGS set. ARM does things
         differently that does not require this. For now, set WITH_REGS if the
         arch supports WITH_REGS (which ARM does not), and this makes it work
         for both ARM64 and x86.
      
       - Fix wasted memory in the saved_cmdlines logic.
      
         The saved_cmdlines is a cache that maps PIDs to COMMs that tracing
         can use. Most trace events only save the PID in the event. The
         saved_cmdlines file lists PIDs to COMMs so that the tracing tools can
         show an actual name and not just a PID for each event. There's an
         array of PIDs that map to a small set of saved COMM strings. The
         array is set to PID_MAX_DEFAULT which is usually set to 32768. When a
         PID comes in, it will add itself to this array along with the index
         into the COMM array (note if the system allows more than
         PID_MAX_DEFAULT, this cache is similar to cache lines as an update of
         a PID that has the same PID_MAX_DEFAULT bits set will flush out
         another task with the same matching bits set).
      
         A while ago, the size of this cache was changed to be dynamic and the
         array was moved into a structure and created with kmalloc(). But this
         new structure had the size of 131104 bytes, or 0x20020 in hex. As
         kmalloc allocates in powers of two, it was actually allocating
         0x40000 bytes (262144) leaving 131040 bytes of wasted memory. The
         last element of this structure was a pointer to the COMM string array
         which defaulted to just saving 128 COMMs.
      
         By changing the last field of this structure to a variable length
         string, and just having it round up to fill the allocated memory, the
         default size of the saved COMM cache is now 8190. This not only uses
         the wasted space, but actually saves space by removing the extra
         allocation for the COMM names.
      
      * tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Fix wasted memory in saved_cmdlines logic
        ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
      ca8a6673
    • Linus Torvalds's avatar
      Merge tag 'probes-fixes-v6.8-rc3' of... · 6dc512a0
      Linus Torvalds authored
      Merge tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull probes fixes from Masami Hiramatsu:
      
       - remove unnecessary initial values of kprobes local variables
      
       - probe-events parser bug fixes:
      
          - calculate the argument size and format string after setting type
            information from BTF, because BTF can change the size and format
            string.
      
          - show $comm parse error correctly instead of failing silently.
      
      * tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        kprobes: Remove unnecessary initial values of variables
        tracing/probes: Fix to set arg size and fmt after setting type from BTF
        tracing/probes: Fix to show a parse error for bad type for $comm
      6dc512a0
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · e6f39a90
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
       "The only notable change here is the patch that changes the way we deal
        with spurious errors from the EFI memory attribute protocol. This will
        be backported to v6.6, and is intended to ensure that we will not
        paint ourselves into a corner when we tighten this further in order to
        comply with MS requirements on signed EFI code.
      
        Note that this protocol does not currently exist in x86 production
        systems in the field, only in Microsoft's fork of OVMF, but it will be
        mandatory for Windows logo certification for x86 PCs in the future.
      
         - Tighten ELF relocation checks on the RISC-V EFI stub
      
         - Give up if the new EFI memory attributes protocol fails spuriously
           on x86
      
         - Take care not to place the kernel in the lowest 16 MB of DRAM on
           x86
      
         - Omit special purpose EFI memory from memblock
      
         - Some fixes for the CXL CPER reporting code
      
         - Make the PE/COFF layout of mixed-mode capable images comply with a
           strict interpretation of the spec"
      
      * tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
        cxl/trace: Remove unnecessary memcpy's
        cxl/cper: Fix errant CPER prints for CXL events
        efi: Don't add memblocks for soft-reserved memory
        efi: runtime: Fix potential overflow of soft-reserved region size
        efi/libstub: Add one kernel-doc comment
        x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR
        x86/efistub: Give up if memory attribute protocol returns an error
        riscv/efistub: Tighten ELF relocation check
        riscv/efistub: Ensure GP-relative addressing is not used
      e6f39a90
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 5ddfc246
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Fix an unintentional truncation of DWC MSI-X address to 32 bits and
         update similar MSI code to match (Dan Carpenter)
      
      * tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        PCI: dwc: Clean up dw_pcie_ep_raise_msi_irq() alignment
        PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
      5ddfc246