1. 09 Apr, 2024 2 commits
    • Sean Christopherson's avatar
      KVM: x86: Move nEPT exit_qualification field from kvm_vcpu_arch to x86_exception · a9466078
      Sean Christopherson authored
      Move the exit_qualification field that is used to track information about
      in-flight nEPT violations from "struct kvm_vcpu_arch" to "x86_exception",
      i.e. associate the information with the actual nEPT violation instead of
      the vCPU.  To handle bits that are pulled from vmcs.EXIT_QUALIFICATION,
      i.e. that are propagated from the "original" EPT violation VM-Exit, simply
      grab them from the VMCS on-demand when injecting a nEPT Violation or a PML
      Full VM-exit.
      
      Aside from being ugly, having an exit_qualification field in kvm_vcpu_arch
      is outright dangerous, e.g. see commit d7f0a00e ("KVM: VMX: Report
      up-to-date exit qualification to userspace").
      
      Opportunstically add a comment to call out that PML Full and EPT Violation
      VM-Exits use the same bit to report NMI blocking information.
      
      Link: https://lore.kernel.org/r/20240209221700.393189-3-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      a9466078
    • Sean Christopherson's avatar
      KVM: nVMX: Clear EXIT_QUALIFICATION when injecting an EPT Misconfig · 0c476514
      Sean Christopherson authored
      Explicitly clear the EXIT_QUALIFCATION field when injecting an EPT
      misconfig into L1, as required by the VMX architecture.  Per the SDM:
      
        This field is saved for VM exits due to the following causes:
        debug exceptions; page-fault exceptions; start-up IPIs (SIPIs);
        system-management interrupts (SMIs) that arrive immediately after the
        execution of I/O instructions; task switches; INVEPT; INVLPG; INVPCID;
        INVVPID; LGDT; LIDT; LLDT; LTR; SGDT; SIDT; SLDT; STR; VMCLEAR; VMPTRLD;
        VMPTRST; VMREAD; VMWRITE; VMXON; WBINVD; WBNOINVD; XRSTORS; XSAVES;
        control-register accesses; MOV DR; I/O instructions; MWAIT; accesses to
        the APIC-access page; EPT violations; EOI virtualization; APIC-write
        emulation; page-modification log full; SPP-related events; and
        instruction timeout. For all other VM exits, this field is cleared.
      
      Generating EXIT_QUALIFICATION from vcpu->arch.exit_qualification is wrong
      for all (two) paths that lead to nested_ept_inject_page_fault().  For EPT
      violations (the common case), vcpu->arch.exit_qualification will have been
      set by handle_ept_violation() to vmcs02.EXIT_QUALIFICATION, i.e. contains
      the information of a EPT violation and thus is likely non-zero.
      
      For an EPT misconfig, which can reach FNAME(walk_addr_generic) and thus
      inject a nEPT misconfig if KVM created an MMIO SPTE that became stale,
      vcpu->arch.exit_qualification will hold the information from the last EPT
      violation VM-Exit, as vcpu->arch.exit_qualification is _only_ written by
      handle_ept_violation().
      
      Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
      Link: https://lore.kernel.org/r/20240209221700.393189-2-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      0c476514
  2. 07 Apr, 2024 4 commits
  3. 06 Apr, 2024 13 commits
  4. 05 Apr, 2024 21 commits
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.9-20240405' of git://git.kernel.dk/linux · 4f72ed49
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Backport of some fixes that came up during development of the 6.10
         io_uring patches. This includes some kbuf cleanups and reference
         fixes.
      
       - Disable multishot read if we don't have NOWAIT support on the target
      
       - Fix for a dependency issue with workqueue flushing
      
      * tag 'io_uring-6.9-20240405' of git://git.kernel.dk/linux:
        io_uring/kbuf: hold io_buffer_list reference over mmap
        io_uring/kbuf: protect io_buffer_list teardown with a reference
        io_uring/kbuf: get rid of bl->is_ready
        io_uring/kbuf: get rid of lower BGID lists
        io_uring: use private workqueue for exit work
        io_uring: disable io-wq execution of multishot NOWAIT requests
        io_uring/rw: don't allow multishot reads without NOWAIT support
      4f72ed49
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4de2ff26
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "The most important is the libsas fix, which is a problem for DMA to a
        kmalloc'd structure too small causing cache line interference. The
        other fixes (all in drivers) are mostly for allocation length fixes,
        error leg unwinding, suspend races and a missing retry"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: core: Fix MCQ mode dev command timeout
        scsi: libsas: Align SMP request allocation to ARCH_DMA_MINALIGN
        scsi: sd: Unregister device if device_add_disk() failed in sd_probe()
        scsi: ufs: core: WLUN suspend dev/link state error recovery
        scsi: mylex: Fix sysfs buffer lengths
      4de2ff26
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 84985eb2
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fix NIOS2 boot with external DTB
      
       - Add missing synchronization needed between fw_devlink and DT overlay
         removals
      
       - Fix some unit-address regex's to be hex only
      
       - Drop some 10+ year old "unstable binding" statements
      
       - Add new SoCs to QCom UFS binding
      
       - Add TPM bindings to TPM maintainers
      
      * tag 'devicetree-fixes-for-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        nios2: Only use built-in devicetree blob if configured to do so
        dt-bindings: timer: narrow regex for unit address to hex numbers
        dt-bindings: soc: fsl: narrow regex for unit address to hex numbers
        dt-bindings: remoteproc: ti,davinci: remove unstable remark
        dt-bindings: clock: ti: remove unstable remark
        dt-bindings: clock: keystone: remove unstable remark
        of: module: prevent NULL pointer dereference in vsnprintf()
        dt-bindings: ufs: qcom: document SM6125 UFS
        dt-bindings: ufs: qcom: document SC7180 UFS
        dt-bindings: ufs: qcom: document SC8180X UFS
        of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
        driver core: Introduce device_link_wait_removal()
        docs: dt-bindings: add missing address/size-cells to example
        MAINTAINERS: Add TPM DT bindings to TPM maintainers
      84985eb2
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-04-05-11-30' of... · af709adf
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "8 hotfixes, 3 are cc:stable
      
        There are a couple of fixups for this cycle's vmalloc changes and one
        for the stackdepot changes. And a fix for a very old x86 PAT issue
        which can cause a warning splat"
      
      * tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        stackdepot: rename pool_index to pool_index_plus_1
        x86/mm/pat: fix VM_PAT handling in COW mappings
        MAINTAINERS: change vmware.com addresses to broadcom.com
        selftests/mm: include strings.h for ffsl
        mm: vmalloc: fix lockdep warning
        mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
        init: open output files from cpio unpacking with O_LARGEFILE
        mm/secretmem: fix GUP-fast succeeding on secretmem folios
      af709adf
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · c7830236
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "arm64/ptrace fix to use the correct SVE layout based on the saved
        floating point state rather than the TIF_SVE flag. The latter may be
        left on during syscalls even if the SVE state is discarded"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/ptrace: Use saved floating point state type to determine SVE layout
      c7830236
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 261b8e89
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A fix for an __{get,put}_kernel_nofault to avoid an uninitialized
         value causing spurious failures
      
       - compat_vdso.so.dbg is now installed to the standard install location
      
       - A fix to avoid initializing PERF_SAMPLE_BRANCH_*-related events, as
         they aren't supported and will just later fail
      
       - A fix to make AT_VECTOR_SIZE_ARCH correct now that we're providing
         AT_MINSIGSTKSZ
      
       - pgprot_nx() is now implemented, which fixes vmap W^X protection
      
       - A fix for the vector save/restore code, which at least manifests as
         corrupted vector state when a signal is taken
      
       - A fix for a race condition in instruction patching
      
       - A fix to avoid leaking the kernel-mode GP to userspace, which is a
         kernel pointer leak that can be used to defeat KASLR in various ways
      
       - A handful of smaller fixes to build warnings, an overzealous printk,
         and some missing tracing annotations
      
      * tag 'riscv-for-linus-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: process: Fix kernel gp leakage
        riscv: Disable preemption when using patch_map()
        riscv: Fix warning by declaring arch_cpu_idle() as noinstr
        riscv: use KERN_INFO in do_trap
        riscv: Fix vector state restore in rt_sigreturn()
        riscv: mm: implement pgprot_nx
        riscv: compat_vdso: align VDSOAS build log
        RISC-V: Update AT_VECTOR_SIZE_ARCH for new AT_MINSIGSTKSZ
        riscv: Mark __se_sys_* functions __used
        drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported
        riscv: compat_vdso: install compat_vdso.so.dbg to /lib/modules/*/vdso/
        riscv: hwprobe: do not produce frtace relocation
        riscv: Fix spurious errors from __get/put_kernel_nofault
        riscv: mm: Fix prototype to avoid discarding const
      261b8e89
    • Linus Torvalds's avatar
      Merge tag 's390-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 50094473
      Linus Torvalds authored
      Pull s390 fixes from Alexander Gordeev:
      
       - Fix missing NULL pointer check when determining guest/host fault
      
       - Mark all functions in asm/atomic_ops.h, asm/atomic.h and
         asm/preempt.h as __always_inline to avoid unwanted instrumentation
      
       - Fix removal of a Processor Activity Instrumentation (PAI) sampling
         event in PMU device driver
      
       - Align system call table on 8 bytes
      
      * tag 's390-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/entry: align system call table on 8 bytes
        s390/pai: fix sampling event removal for PMU device driver
        s390/preempt: mark all functions __always_inline
        s390/atomic: mark all functions __always_inline
        s390/mm: fix NULL pointer dereference
      50094473
    • Linus Torvalds's avatar
      Merge tag 'pm-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 2f9fd9e4
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Fix a recent Energy Model change that went against a recent scheduler
        change made independently (Vincent Guittot)"
      
      * tag 'pm-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: EM: fix wrong utilization estimation in em_cpu_energy()
      2f9fd9e4
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b21defcb
      Linus Torvalds authored
      Pull thermal control fixes from Rafael Wysocki:
       "These fix two power allocator thermal governor issues and an ACPI
        thermal driver regression that all were introduced during the 6.8
        development cycle.
      
        Specifics:
      
         - Allow the power allocator thermal governor to bind to a thermal
           zone without cooling devices and/or without trip points (Nikita
           Travkin)
      
         - Make the ACPI thermal driver register a tripless thermal zone when
           it cannot find any usable trip points instead of returning an error
           from acpi_thermal_add() (Stephen Horvath)"
      
      * tag 'thermal-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: gov_power_allocator: Allow binding without trip points
        thermal: gov_power_allocator: Allow binding without cooling devices
        ACPI: thermal: Register thermal zones without valid trip points
      b21defcb
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 2e69af16
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - make sure GPIO devices are registered with the subsystem before
         trying to return them to a caller of gpio_device_find()
      
       - fix two issues with incorrect sanitization of the interrupt labels
      
      * tag 'gpio-fixes-for-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: cdev: fix missed label sanitizing in debounce_setup()
        gpio: cdev: check for NULL labels when sanitizing them for irqs
        gpiolib: Fix triggering "kobject: 'gpiochipX' is not initialized, yet" kobject_get() errors
      2e69af16
    • Linus Torvalds's avatar
      Merge tag 'ata-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · 4c3fc345
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Compilation warning fixes from Arnd: one in the sata_sx4 driver due
         to an incorrect calculation of the parameters passed to memcpy() and
         another one in the sata_mv driver when CONFIG_PCI is not set
      
       - Drop the owner driver field assignment in the pata_macio driver. That
         is not needed as the PCI core code does that already (Krzysztof)
      
       - Remove an unusued field in struct st_ahci_drv_data of the ahci_st
         driver (Christophe)
      
       - Add a missing clock probe error check in the sata_gemini driver
         (Chen)
      
      * tag 'ata-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
        ata: sata_gemini: Check clk_enable() result
        ata: sata_mv: Fix PCI device ID table declaration compilation warning
        ata: ahci_st: Remove an unused field in struct st_ahci_drv_data
        ata: pata_macio: drop driver owner assignment
        ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
      4c3fc345
    • Linus Torvalds's avatar
      Merge tag 'sound-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c42881d4
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became a bit bigger collection of patches, but almost all are
        about device-specific fixes, and should be safe for 6.9:
      
         - Lots of ASoC Intel SOF-related fixes/updates
      
         - Locking fixes in SoundWire drivers
      
         - ASoC AMD ACP/SOF updates
      
         - ASoC ES8326 codec fixes
      
         - HD-audio codec fixes and quirks
      
         - A regression fix in emu10k1 synth code"
      
      * tag 'sound-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (49 commits)
        ASoC: SOF: Core: Add remove_late() to sof_init_environment failure path
        ASoC: SOF: amd: fix for false dsp interrupts
        ASoC: SOF: Intel: lnl: Disable DMIC/SSP offload on remove
        ASoC: Intel: avs: boards: Add modules description
        ASoC: codecs: ES8326: Removing the control of ADC_SCALE
        ASoC: codecs: ES8326: Solve a headphone detection issue after suspend and resume
        ASoC: codecs: ES8326: modify clock table
        ASoC: codecs: ES8326: Solve error interruption issue
        ALSA: line6: Zero-initialize message buffers
        ALSA: hda/realtek: cs35l41: Support ASUS ROG G634JYR
        ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
        ALSA: hda/realtek: Add sound quirks for Lenovo Legion slim 7 16ARHA7 models
        Revert "ALSA: emu10k1: fix synthesizer sample playback position and caching"
        OSS: dmasound/paula: Mark driver struct with __refdata to prevent section mismatch
        ALSA: hda/realtek: Add quirks for ASUS Laptops using CS35L56
        ASoC: amd: acp: fix for acp_init function error handling
        ASoC: tas2781: mark dvc_tlv with __maybe_unused
        ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
        ASoC: rt-sdw*: add __func__ to all error logs
        ASoC: rt722-sdca-sdw: fix locking sequence
        ...
      c42881d4
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-04-05' of https://gitlab.freedesktop.org/drm/kernel · 89103a16
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Weekly fixes, mostly xe and i915, amdgpu on a week off, otherwise a
        nouveau fix for a crash with new vulkan cts tests, and a couple of
        cleanups and misc fixes.
      
        display:
         - fix typos in kerneldoc
      
        prime:
         - unbreak dma-buf export for virt-gpu
      
        nouveau:
         - uvmm: fix remap address calculation
         - minor cleanups
      
        panfrost:
         - fix power-transition timeouts
      
        xe:
         - Stop using system_unbound_wq for preempt fences
         - Fix saving unordered rebinding fences by attaching them as kernel
           feces to the vm's resv
         - Fix TLB invalidation fences completing out of order
         - Move rebind TLB invalidation to the ring ops to reduce the latency
      
        i915:
         - A few DisplayPort related fixes
         - eDP PSR fixes
         - Remove some VM space restrictions on older platforms
         - Disable automatic load CCS load balancing"
      
      * tag 'drm-fixes-2024-04-05' of https://gitlab.freedesktop.org/drm/kernel: (22 commits)
        drm/xe: Use ordered wq for preempt fence waiting
        drm/xe: Move vma rebinding to the drm_exec locking loop
        drm/xe: Make TLB invalidation fences unordered
        drm/xe: Rework rebinding
        drm/xe: Use ring ops TLB invalidation for rebinds
        drm/i915/mst: Reject FEC+MST on ICL
        drm/i915/mst: Limit MST+DSC to TGL+
        drm/i915/dp: Fix the computation for compressed_bpp for DISPLAY < 13
        drm/i915/gt: Enable only one CCS for compute workload
        drm/i915/gt: Do not generate the command streamer for all the CCS
        drm/i915/gt: Disable HW load balancing for CCS
        drm/i915/gt: Limit the reserved VM space to only the platforms that need it
        drm/i915/psr: Fix intel_psr2_sel_fetch_et_alignment usage
        drm/i915/psr: Move writing early transport pipe src
        drm/i915/psr: Calculate PIPE_SRCSZ_ERLY_TPT value
        drm/i915/dp: Remove support for UHBR13.5
        drm/i915/dp: Fix DSC state HW readout for SST connectors
        drm/display: fix typo
        drm/prime: Unbreak virtgpu dma-buf export
        nouveau/uvmm: fix addr/range calcs for remap operations
        ...
      89103a16
    • Peter Collingbourne's avatar
      stackdepot: rename pool_index to pool_index_plus_1 · a6c1d9cb
      Peter Collingbourne authored
      Commit 3ee34eab ("lib/stackdepot: fix first entry having a 0-handle")
      changed the meaning of the pool_index field to mean "the pool index plus
      1".  This made the code accessing this field less self-documenting, as
      well as causing debuggers such as drgn to not be able to easily remain
      compatible with both old and new kernels, because they typically do that
      by testing for presence of the new field.  Because stackdepot is a
      debugging tool, we should make sure that it is debugger friendly. 
      Therefore, give the field a different name to improve readability as well
      as enabling debugger backwards compatibility.
      
      This is needed in 6.9, which would otherwise become an odd release with
      the new semantics and old name so debuggers wouldn't recognize the new
      semantics there.
      
      Fixes: 3ee34eab ("lib/stackdepot: fix first entry having a 0-handle")
      Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com
      Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88Signed-off-by: default avatarPeter Collingbourne <pcc@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a6c1d9cb
    • David Hildenbrand's avatar
      x86/mm/pat: fix VM_PAT handling in COW mappings · 04c35ab3
      David Hildenbrand authored
      PAT handling won't do the right thing in COW mappings: the first PTE (or,
      in fact, all PTEs) can be replaced during write faults to point at anon
      folios.  Reliably recovering the correct PFN and cachemode using
      follow_phys() from PTEs will not work in COW mappings.
      
      Using follow_phys(), we might just get the address+protection of the anon
      folio (which is very wrong), or fail on swap/nonswap entries, failing
      follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
      track_pfn_copy(), not properly calling free_pfn_range().
      
      In free_pfn_range(), we either wouldn't call memtype_free() or would call
      it with the wrong range, possibly leaking memory.
      
      To fix that, let's update follow_phys() to refuse returning anon folios,
      and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
      if we run into that.
      
      We will now properly handle untrack_pfn() with COW mappings, where we
      don't need the cachemode.  We'll have to fail fork()->track_pfn_copy() if
      the first page was replaced by an anon folio, though: we'd have to store
      the cachemode in the VMA to make this work, likely growing the VMA size.
      
      For now, lets keep it simple and let track_pfn_copy() just fail in that
      case: it would have failed in the past with swap/nonswap entries already,
      and it would have done the wrong thing with anon folios.
      
      Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
      
      <--- C reproducer --->
       #include <stdio.h>
       #include <sys/mman.h>
       #include <unistd.h>
       #include <liburing.h>
      
       int main(void)
       {
               struct io_uring_params p = {};
               int ring_fd;
               size_t size;
               char *map;
      
               ring_fd = io_uring_setup(1, &p);
               if (ring_fd < 0) {
                       perror("io_uring_setup");
                       return 1;
               }
               size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
      
               /* Map the submission queue ring MAP_PRIVATE */
               map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
                          ring_fd, IORING_OFF_SQ_RING);
               if (map == MAP_FAILED) {
                       perror("mmap");
                       return 1;
               }
      
               /* We have at least one page. Let's COW it. */
               *map = 0;
               pause();
               return 0;
       }
      <--- C reproducer --->
      
      On a system with 16 GiB RAM and swap configured:
       # ./iouring &
       # memhog 16G
       # killall iouring
      [  301.552930] ------------[ cut here ]------------
      [  301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
      [  301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
      [  301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
      [  301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
      [  301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
      [  301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
      [  301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
      [  301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
      [  301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
      [  301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
      [  301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
      [  301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
      [  301.564186] FS:  0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
      [  301.564773] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
      [  301.565725] PKRU: 55555554
      [  301.565944] Call Trace:
      [  301.566148]  <TASK>
      [  301.566325]  ? untrack_pfn+0xf4/0x100
      [  301.566618]  ? __warn+0x81/0x130
      [  301.566876]  ? untrack_pfn+0xf4/0x100
      [  301.567163]  ? report_bug+0x171/0x1a0
      [  301.567466]  ? handle_bug+0x3c/0x80
      [  301.567743]  ? exc_invalid_op+0x17/0x70
      [  301.568038]  ? asm_exc_invalid_op+0x1a/0x20
      [  301.568363]  ? untrack_pfn+0xf4/0x100
      [  301.568660]  ? untrack_pfn+0x65/0x100
      [  301.568947]  unmap_single_vma+0xa6/0xe0
      [  301.569247]  unmap_vmas+0xb5/0x190
      [  301.569532]  exit_mmap+0xec/0x340
      [  301.569801]  __mmput+0x3e/0x130
      [  301.570051]  do_exit+0x305/0xaf0
      ...
      
      Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarWupeng Ma <mawupeng1@huawei.com>
      Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
      Fixes: b1a86e15 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
      Fixes: 5899329b ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      04c35ab3
    • Alexey Makhalov's avatar
      MAINTAINERS: change vmware.com addresses to broadcom.com · 87f0e65c
      Alexey Makhalov authored
      Update all remaining vmware.com email addresses to actual broadcom.com.
      
      Add corresponding .mailmap entries for maintainers who contributed in the
      past as the vmware.com address will start bouncing soon.
      
      Maintainership update. Jeff Sipek has left VMware, Nick Shi will be
      maintaining VMware PTP.
      
      Link: https://lkml.kernel.org/r/20240402232334.33167-1-alexey.makhalov@broadcom.comSigned-off-by: default avatarAlexey Makhalov <alexey.makhalov@broadcom.com>
      Acked-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Acked-by: default avatarAjay Kaher <ajay.kaher@broadcom.com>
      Acked-by: default avatarRonak Doshi <ronak.doshi@broadcom.com>
      Acked-by: default avatarNick Shi <nick.shi@broadcom.com>
      Acked-by: default avatarBryan Tan <bryan-bt.tan@broadcom.com>
      Acked-by: default avatarVishnu Dasa <vishnu.dasa@broadcom.com>
      Acked-by: default avatarVishal Bhakta <vishal.bhakta@broadcom.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      87f0e65c
    • Edward Liaw's avatar
      selftests/mm: include strings.h for ffsl · 176517c9
      Edward Liaw authored
      Got a compilation error on Android for ffsl after 91b80cc5
      ("selftests: mm: fix map_hugetlb failure on 64K page size systems")
      included vm_util.h.
      
      Link: https://lkml.kernel.org/r/20240329185814.16304-1-edliaw@google.com
      Fixes: af605d26 ("selftests/mm: merge util.h into vm_util.h")
      Signed-off-by: default avatarEdward Liaw <edliaw@google.com>
      Reviewed-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      176517c9
    • Uladzislau Rezki (Sony)'s avatar
      mm: vmalloc: fix lockdep warning · fc2c2269
      Uladzislau Rezki (Sony) authored
      A lockdep reports a possible deadlock in the find_vmap_area_exceed_addr_lock()
      function:
      
      ============================================
      WARNING: possible recursive locking detected
      6.9.0-rc1-00060-ged3ccc57b108-dirty #6140 Not tainted
      --------------------------------------------
      drgn/455 is trying to acquire lock:
      ffff0000c00131d0 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124
      
      but task is already holding lock:
      ffff0000c0011878 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&vn->busy.lock/1);
        lock(&vn->busy.lock/1);
      
       *** DEADLOCK ***
      
      indeed it can happen if the find_vmap_area_exceed_addr_lock() gets called
      concurrently because it tries to acquire two nodes locks.  It was done to
      prevent removing a lowest VA found on a previous step.
      
      To address this a lowest VA is found first without holding a node lock
      where it resides.  As a last step we check if a VA still there because it
      can go away, if removed, proceed with next lowest.
      
      [akpm@linux-foundation.org: fix comment typos, per Baoquan]
      Link: https://lkml.kernel.org/r/20240328140330.4747-1-urezki@gmail.com
      Fixes: 53becf32 ("mm: vmalloc: support multiple nodes in vread_iter")
      Signed-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Tested-by: default avatarJens Axboe <axboe@kernel.dk>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fc2c2269
    • Uladzislau Rezki (Sony)'s avatar
      mm: vmalloc: bail out early in find_vmap_area() if vmap is not init · 4ed91fa9
      Uladzislau Rezki (Sony) authored
      During the boot the s390 system triggers "spinlock bad magic" messages
      if the spinlock debugging is enabled:
      
      [    0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
      [    0.465490]  lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
      [    0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e39 #1
      [    0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
      [    0.466270] Call Trace:
      [    0.466470]  [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
      [    0.466516]  [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
      [    0.466545]  [<000000000042146c>] find_vmap_area+0x6c/0x108
      [    0.466572]  [<000000000042175a>] find_vm_area+0x22/0x40
      [    0.466597]  [<000000000012f152>] __set_memory+0x132/0x150
      [    0.466624]  [<0000000001cc0398>] vmem_map_init+0x40/0x118
      [    0.466651]  [<0000000001cc0092>] paging_init+0x22/0x68
      [    0.466677]  [<0000000001cbbed2>] setup_arch+0x52a/0x708
      [    0.466702]  [<0000000001cb6140>] start_kernel+0x80/0x5c8
      [    0.466727]  [<0000000000100036>] startup_continue+0x36/0x40
      
      it happens because such system tries to access some vmap areas
      whereas the vmalloc initialization is not even yet done:
      
      [    0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
      [    0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e39 #1
      [    0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
      [    0.466270] Call Trace:
      [    0.466470] dump_stack_lvl (lib/dump_stack.c:117)
      [    0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
      [    0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
      [    0.466572] find_vm_area (mm/vmalloc.c:3150)
      [    0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
      [    0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
      [    0.466651] paging_init (arch/s390/mm/init.c:97)
      [    0.466677] setup_arch (arch/s390/kernel/setup.c:972)
      [    0.466702] start_kernel (init/main.c:899)
      [    0.466727] startup_continue (arch/s390/kernel/head64.S:35)
      [    0.466811] INFO: lockdep is turned off.
      ...
      [    0.718250] vmalloc init - busy lock init 0000000002871860
      [    0.718328] vmalloc init - busy lock init 00000000028731b8
      
      Some background. It worked before because the lock that is in question
      was statically defined and initialized. As of now, the locks and data
      structures are initialized in the vmalloc_init() function.
      
      To address that issue add the check whether the "vmap_initialized"
      variable is set, if not find_vmap_area() bails out on entry returning NULL.
      
      Link: https://lkml.kernel.org/r/20240323141544.4150-1-urezki@gmail.com
      Fixes: 72210662 ("mm: vmalloc: offload free_vmap_area_lock lock")
      Signed-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4ed91fa9
    • John Sperbeck's avatar
      init: open output files from cpio unpacking with O_LARGEFILE · 8434f9aa
      John Sperbeck authored
      If a member of a cpio archive for an initrd or initrams is larger than
      2Gb, we'll eventually fail to write to that file when we get to that
      limit, unless O_LARGEFILE is set.
      
      The problem can be seen with this recipe, assuming that BLK_DEV_RAM
      is not configured:
      
      cd /tmp
      dd if=/dev/zero of=BIGFILE bs=1048576 count=2200
      echo BIGFILE | cpio -o -H newc -R root:root > initrd.img
      kexec -l /boot/vmlinuz-$(uname -r) --initrd=initrd.img --reuse-cmdline
      kexec -e
      
      The console will show 'Initramfs unpacking failed: write error'.  With
      the patch, the error is gone.
      
      Link: https://lkml.kernel.org/r/20240323152934.3307391-1-jsperbeck@google.comSigned-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8434f9aa
    • David Hildenbrand's avatar
      mm/secretmem: fix GUP-fast succeeding on secretmem folios · 65291dcf
      David Hildenbrand authored
      folio_is_secretmem() currently relies on secretmem folios being LRU
      folios, to save some cycles.
      
      However, folios might reside in a folio batch without the LRU flag set, or
      temporarily have their LRU flag cleared.  Consequently, the LRU flag is
      unreliable for this purpose.
      
      In particular, this is the case when secretmem_fault() allocates a fresh
      page and calls filemap_add_folio()->folio_add_lru().  The folio might be
      added to the per-cpu folio batch and won't get the LRU flag set until the
      batch was drained using e.g., lru_add_drain().
      
      Consequently, folio_is_secretmem() might not detect secretmem folios and
      GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
      when we would later try reading/writing to the folio, because the folio
      has been unmapped from the directmap.
      
      Fix it by removing that unreliable check.
      
      Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com
      Fixes: 1507f512 ("mm: introduce memfd_secret system call to create "secret" memory areas")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
      Reported-by: default avataryue sun <samsun1006219@gmail.com>
      Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/Debugged-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Tested-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65291dcf