1. 23 Oct, 2022 1 commit
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux · 942e01ab
      Linus Torvalds authored
      Pull io_uring follow-up from Jens Axboe:
       "Currently the zero-copy has automatic fallback to normal transmit, and
        it was decided that it'd be cleaner to return an error instead if the
        socket type doesn't support it.
      
        Zero-copy does work with UDP and TCP, it's more of a future proofing
        kind of thing (eg for samba)"
      
      * tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux:
        io_uring/net: fail zc sendmsg when unsupported by socket
        io_uring/net: fail zc send when unsupported by socket
        net: flag sockets supporting msghdr originated zerocopy
      942e01ab
  2. 22 Oct, 2022 14 commits
  3. 21 Oct, 2022 25 commits
    • Linus Torvalds's avatar
      Merge tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · bd8e9634
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
      
       - memory leak fixes
      
       - fixes for directory leases, including an important one which fixes a
         problem noticed by git functional tests
      
       - fixes relating to missing free_xid calls (helpful for
         tracing/debugging of entry/exit into cifs.ko)
      
       - a multichannel fix
      
       - a small cleanup fix (use of list_move instead of list_del/list_add)
      
      * tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal module number
        cifs: fix memory leaks in session setup
        cifs: drop the lease for cached directories on rmdir or rename
        smb3: interface count displayed incorrectly
        cifs: Fix memory leak when build ntlmssp negotiate blob failed
        cifs: set rc to -ENOENT if we can not get a dentry for the cached dir
        cifs: use LIST_HEAD() and list_move() to simplify code
        cifs: Fix xid leak in cifs_get_file_info_unix()
        cifs: Fix xid leak in cifs_ses_add_channel()
        cifs: Fix xid leak in cifs_flock()
        cifs: Fix xid leak in cifs_copy_file_range()
        cifs: Fix xid leak in cifs_create()
      bd8e9634
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 022c028f
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
       "Fixes for patches merged in v6.1"
      
      * tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        nfsd: ensure we always call fh_verify_error tracepoint
        NFSD: unregister shrinker when nfsd_init_net() fails
      022c028f
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ed537795
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two small changes, one in the lpfc driver and the other in the core.
      
        The core change is an additional footgun guard which prevents users
        from writing the wrong state to sysfs and causing a hang"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: lpfc: Fix memory leak in lpfc_create_port()
        scsi: core: Restrict legal sdev_state transitions via sysfs
      ed537795
    • Linus Torvalds's avatar
      Merge tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux · d4b7332e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Christoph:
            - fix nvme-hwmon for DMA non-cohehrent architectures (Serge Semin)
            - add a nvme-hwmong maintainer (Christoph Hellwig)
            - fix error pointer dereference in error handling (Dan Carpenter)
            - fix invalid memory reference in nvmet_subsys_attr_qid_max_show
              (Daniel Wagner)
            - don't limit the DMA segment size in nvme-apple (Russell King)
            - fix workqueue MEM_RECLAIM flushing dependency (Sagi Grimberg)
            - disable write zeroes on various Kingston SSDs (Xander Li)
      
       - fix a memory leak with block device tracing (Ye)
      
       - flexible-array fix for ublk (Yushan)
      
       - document the ublk recovery feature from this merge window
         (ZiyangZhang)
      
       - remove dead bfq variable in struct (Yuwei)
      
       - error handling rq clearing fix (Yu)
      
       - add an IRQ safety check for the cached bio freeing (Pavel)
      
       - drbd bio cloning fix (Christoph)
      
      * tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux:
        blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'
        blktrace: fix possible memleak in '__blk_trace_remove'
        blktrace: introduce 'blk_trace_{start,stop}' helper
        bio: safeguard REQ_ALLOC_CACHE bio put
        block, bfq: remove unused variable for bfq_queue
        drbd: only clone bio if we have a backing device
        ublk_drv: use flexible-array member instead of zero-length array
        nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
        nvmet: fix workqueue MEM_RECLAIM flushing dependency
        nvme-hwmon: kmalloc the NVME SMART log buffer
        nvme-hwmon: consistently ignore errors from nvme_hwmon_init
        nvme: add Guenther as nvme-hwmon maintainer
        nvme-apple: don't limit DMA segement size
        nvme-pci: disable write zeroes on various Kingston SSD
        nvme: fix error pointer dereference in error handling
        Documentation: document ublk user recovery feature
        blk-mq: fix null pointer dereference in blk_mq_clear_rq_mapping()
      d4b7332e
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux · 294e73ff
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix a potential memory leak in the error handling path of io-wq setup
         (Rafael)
      
       - Kill an errant debug statement that got added in this release (me)
      
       - Fix an oops with an invalid direct descriptor with IORING_OP_MSG_RING
         (Harshit)
      
       - Remove unneeded FFS_SCM flagging (Pavel)
      
       - Remove polling off the exit path (Pavel)
      
       - Move out direct descriptor debug check to the cleanup path (Pavel)
      
       - Use the proper helper rather than open-coding cached request get
         (Pavel)
      
      * tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux:
        io-wq: Fix memory leak in worker creation
        io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()
        io_uring/rw: remove leftover debug statement
        io_uring: don't iopoll from io_ring_ctx_wait_and_kill()
        io_uring: reuse io_alloc_req()
        io_uring: kill hot path fixed file bitmap debug checks
        io_uring: remove FFS_SCM
      294e73ff
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 1d61754c
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "Just two fixes for the new 'virtio with grants' feature"
      
      * tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/virtio: Convert PAGE_SIZE/PAGE_SHIFT/PFN_UP to Xen counterparts
        xen/virtio: Handle cases when page offset > PAGE_SIZE properly
      1d61754c
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 0de0b768
      Linus Torvalds authored
      Pull selinux fix from Paul Moore:
       "A small SELinux fix for a GFP_KERNEL allocation while a spinlock is
        held.
      
        The patch, while still fairly small, is a bit larger than one might
        expect from a simple s/GFP_KERNEL/GFP_ATOMIC/ conversion because we
        added support for the function to be called with different gfp flags
        depending on the context, preserving GFP_KERNEL for those cases that
        can safely sleep"
      
      * tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()
      0de0b768
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 440b7895
      Linus Torvalds authored
      Pull misc fixes from Andrew Morron:
       "Seventeen hotfixes, mainly for MM.
      
        Five are cc:stable and the remainder address post-6.0 issues"
      
      * tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        nouveau: fix migrate_to_ram() for faulting page
        mm/huge_memory: do not clobber swp_entry_t during THP split
        hugetlb: fix memory leak associated with vma_lock structure
        mm/page_alloc: reduce potential fragmentation in make_alloc_exact()
        mm: /proc/pid/smaps_rollup: fix maple tree search
        mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
        mm/mmap: fix MAP_FIXED address return on VMA merge
        mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
        mm/mmap: undo ->mmap() when mas_preallocate() fails
        init: Kconfig: fix spelling mistake "satify" -> "satisfy"
        ocfs2: clear dinode links count in case of error
        ocfs2: fix BUG when iput after ocfs2_mknod fails
        gcov: support GCC 12.1 and newer compilers
        zsmalloc: zs_destroy_pool: add size_class NULL check
        mm/mempolicy: fix mbind_range() arguments to vma_merge()
        mailmap: update email for Qais Yousef
        mailmap: update Dan Carpenter's email address
      440b7895
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ce3d90a8
      Linus Torvalds authored
      Pull tracing tool update from Steven Rostedt:
      
       - Make dot2c generate monitor's automata definition static
      
      * tag 'trace-tools-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        rv/dot2c: Make automaton definition static
      ce3d90a8
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-6.1-rc2' of git://www.linux-watchdog.org/linux-watchdog · 4f1e0c18
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - Add tracing events for the most common watchdog events
      
      * tag 'linux-watchdog-6.1-rc2' of git://www.linux-watchdog.org/linux-watchdog:
        watchdog: Add tracing events for the most usual watchdog events
      4f1e0c18
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-scan', 'acpi-resource', 'acpi-apei', 'acpi-extlog' and 'acpi-docs' · 3f8deab6
      Rafael J. Wysocki authored
      Merge assorted ACPI fixes for 6.1-rc2:
      
       - Fix resource list walk in acpi_dma_get_range() (Robin Murphy).
      
       - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ
         override warning message (Jiri Slaby).
      
       - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra).
      
       - Fix multiple error records handling in one of the ACPI extlog driver
         code paths (Tony Luck).
      
       - Prune DSDT override documentation from index after dropping it (Bagas
         Sanjaya).
      
      * acpi-scan:
        ACPI: scan: Fix DMA range assignment
      
      * acpi-resource:
        ACPI: resource: note more about IRQ override
        ACPI: resource: do IRQ override on LENOVO IdeaPad
      
      * acpi-apei:
        ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
      
      * acpi-extlog:
        ACPI: extlog: Handle multiple records
      
      * acpi-docs:
        Documentation: ACPI: Prune DSDT override documentation from index
      3f8deab6
    • Ard Biesheuvel's avatar
      efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0 · 37926f96
      Ard Biesheuvel authored
      The generic EFI stub can be instructed to avoid SetVirtualAddressMap(),
      and simply run with the firmware's 1:1 mapping. In this case, it
      populates the virtual address fields of the runtime regions in the
      memory map with the physical address of each region, so that the mapping
      code has to be none the wiser. Only if SetVirtualAddressMap() fails, the
      virtual addresses are wiped and the kernel code knows that the regions
      cannot be mapped.
      
      However, wiping amounts to setting it to zero, and if a runtime region
      happens to live at physical address 0, its valid 1:1 mapped virtual
      address could be mistaken for a wiped field, resulting on loss of access
      to the EFI services at runtime.
      
      So let's only assume that VA == 0 means 'no runtime services' if the
      region in question does not live at PA 0x0.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      37926f96
    • Ard Biesheuvel's avatar
      efi: libstub: Fix incorrect payload size in zboot header · 53a7ea28
      Ard Biesheuvel authored
      The linker script symbol definition that captures the size of the
      compressed payload inside the zboot decompressor (which is exposed via
      the image header) refers to '.' for the end of the region, which does
      not give the correct result as the expression is not placed at the end
      of the payload. So use the symbol name explicitly.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      53a7ea28
    • Ard Biesheuvel's avatar
      efi: libstub: Give efi_main() asmlinkage qualification · db14655a
      Ard Biesheuvel authored
      To stop the bots from sending sparse warnings to me and the list about
      efi_main() not having a prototype, decorate it with asmlinkage so that
      it is clear that it is called from assembly, and therefore needs to
      remain external, even if it is never declared in a header file.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      db14655a
    • Ard Biesheuvel's avatar
      efi: efivars: Fix variable writes without query_variable_store() · 8a254d90
      Ard Biesheuvel authored
      Commit bbc6d2c6 ("efi: vars: Switch to new wrapper layer")
      refactored the efivars layer so that the 'business logic' related to
      which UEFI variables affect the boot flow in which way could be moved
      out of it, and into the efivarfs driver.
      
      This inadvertently broke setting variables on firmware implementations
      that lack the QueryVariableInfo() boot service, because we no longer
      tolerate a EFI_UNSUPPORTED result from check_var_size() when calling
      efivar_entry_set_get_size(), which now ends up calling check_var_size()
      a second time inadvertently.
      
      If QueryVariableInfo() is missing, we support writes of up to 64k -
      let's move that logic into check_var_size(), and drop the redundant
      call.
      
      Cc: <stable@vger.kernel.org> # v6.0
      Fixes: bbc6d2c6 ("efi: vars: Switch to new wrapper layer")
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      8a254d90
    • Ard Biesheuvel's avatar
      efi: ssdt: Don't free memory if ACPI table was loaded successfully · 4b017e59
      Ard Biesheuvel authored
      Amadeusz reports KASAN use-after-free errors introduced by commit
      3881ee0b ("efi: avoid efivars layer when loading SSDTs from
      variables"). The problem appears to be that the memory that holds the
      new ACPI table is now freed unconditionally, instead of only when the
      ACPI core reported a failure to load the table.
      
      So let's fix this, by omitting the kfree() on success.
      
      Cc: <stable@vger.kernel.org> # v6.0
      Link: https://lore.kernel.org/all/a101a10a-4fbb-5fae-2e3c-76cf96ed8fbd@linux.intel.com/
      Fixes: 3881ee0b ("efi: avoid efivars layer when loading SSDTs from variables")
      Reported-by: default avatarAmadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      4b017e59
    • Ard Biesheuvel's avatar
      efi: libstub: Remove zboot signing from build options · f57fb375
      Ard Biesheuvel authored
      The zboot decompressor series introduced a feature to sign the PE/COFF
      kernel image for secure boot as part of the kernel build. This was
      necessary because there are actually two images that need to be signed:
      the kernel with the EFI stub attached, and the decompressor application.
      
      This is a bit of a burden, because it means that the images must be
      signed on the the same system that performs the build, and this is not
      realistic for distros.
      
      During the next cycle, we will introduce changes to the zboot code so
      that the inner image no longer needs to be signed. This means that the
      outer PE/COFF image can be handled as usual, and be signed later in the
      release process.
      
      Let's remove the associated Kconfig options now so that they don't end
      up in a LTS release while already being deprecated.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      f57fb375
    • Jerry Snitselaar's avatar
      iommu/vt-d: Clean up si_domain in the init_dmars() error path · 620bf9f9
      Jerry Snitselaar authored
      A splat from kmem_cache_destroy() was seen with a kernel prior to
      commit ee2653bb ("iommu/vt-d: Remove domain and devinfo mempool")
      when there was a failure in init_dmars(), because the iommu_domain
      cache still had objects. While the mempool code is now gone, there
      still is a leak of the si_domain memory if init_dmars() fails. So
      clean up si_domain in the init_dmars() error path.
      
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Fixes: 86080ccc ("iommu/vt-d: Allocate si_domain in init_dmars()")
      Signed-off-by: default avatarJerry Snitselaar <jsnitsel@redhat.com>
      Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.comSigned-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      620bf9f9
    • Charlotte Tan's avatar
      iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() · 5566e68d
      Charlotte Tan authored
      arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI
      Reserved region, but it seems like it should accept an NVS region as
      well. The ACPI spec
      https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html
      uses similar wording for "Reserved" and "NVS" region types; for NVS
      regions it says "This range of addresses is in use or reserved by the
      system and must not be used by the operating system."
      
      There is an old comment on this mailing list that also suggests NVS
      regions should pass the arch_rmrr_sanity_check() test:
      
       The warnings come from arch_rmrr_sanity_check() since it checks whether
       the region is E820_TYPE_RESERVED. However, if the purpose of the check
       is to detect RMRR has regions that may be used by OS as free memory,
       isn't  E820_TYPE_NVS safe, too?
      
      This patch overlaps with another proposed patch that would add the region
      type to the log since sometimes the bug reporter sees this log on the
      console but doesn't know to include the kernel log:
      
      https://lore.kernel.org/lkml/20220611204859.234975-3-atomlin@redhat.com/
      
      Here's an example of the "Firmware Bug" apparent false positive (wrapped
      for line length):
      
       DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR
             [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for
             fixes
       DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR
             [0x000000006f760000-0x000000006f762fff]
      
      This is the snippet from the e820 table:
      
       BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved
       BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS
       BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data
      
      Fixes: f036c7fa ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
      Cc: Will Mortensen <will@extrahop.com>
      Link: https://lore.kernel.org/linux-iommu/64a5843d-850d-e58c-4fc2-0a0eeeb656dc@nec.com/
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443Signed-off-by: default avatarCharlotte Tan <charlotte@extrahop.com>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Link: https://lore.kernel.org/r/20220929044449.32515-1-charlotte@extrahop.comSigned-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      5566e68d
    • Lu Baolu's avatar
      iommu/vt-d: Use rcu_lock in get_resv_regions · bf638a65
      Lu Baolu authored
      Commit 5f64ce54 ("iommu/vt-d: Duplicate iommu_resv_region objects
      per device list") converted rcu_lock in get_resv_regions to
      dmar_global_lock to allow sleeping in iommu_alloc_resv_region(). This
      introduced possible recursive locking if get_resv_regions is called from
      within a section where intel_iommu_init() already holds dmar_global_lock.
      
      Especially, after commit 57365a04 ("iommu: Move bus setup to IOMMU
      device registration"), below lockdep splats could always be seen.
      
       ============================================
       WARNING: possible recursive locking detected
       6.0.0-rc4+ #325 Tainted: G          I
       --------------------------------------------
       swapper/0/1 is trying to acquire lock:
       ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
       intel_iommu_get_resv_regions+0x25/0x270
      
       but task is already holding lock:
       ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
       intel_iommu_init+0x36d/0x6ea
      
       ...
      
       Call Trace:
        <TASK>
        dump_stack_lvl+0x48/0x5f
        __lock_acquire.cold.73+0xad/0x2bb
        lock_acquire+0xc2/0x2e0
        ? intel_iommu_get_resv_regions+0x25/0x270
        ? lock_is_held_type+0x9d/0x110
        down_read+0x42/0x150
        ? intel_iommu_get_resv_regions+0x25/0x270
        intel_iommu_get_resv_regions+0x25/0x270
        iommu_create_device_direct_mappings.isra.28+0x8d/0x1c0
        ? iommu_get_dma_cookie+0x6d/0x90
        bus_iommu_probe+0x19f/0x2e0
        iommu_device_register+0xd4/0x130
        intel_iommu_init+0x3e1/0x6ea
        ? iommu_setup+0x289/0x289
        ? rdinit_setup+0x34/0x34
        pci_iommu_init+0x12/0x3a
        do_one_initcall+0x65/0x320
        ? rdinit_setup+0x34/0x34
        ? rcu_read_lock_sched_held+0x5a/0x80
        kernel_init_freeable+0x28a/0x2f3
        ? rest_init+0x1b0/0x1b0
        kernel_init+0x1a/0x130
        ret_from_fork+0x1f/0x30
        </TASK>
      
      This rolls back dmar_global_lock to rcu_lock in get_resv_regions to avoid
      the lockdep splat.
      
      Fixes: 57365a04 ("iommu: Move bus setup to IOMMU device registration")
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Link: https://lore.kernel.org/r/20220927053109.4053662-3-baolu.lu@linux.intel.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      bf638a65
    • Lu Baolu's avatar
      iommu: Add gfp parameter to iommu_alloc_resv_region · 0251d010
      Lu Baolu authored
      Add gfp parameter to iommu_alloc_resv_region() for the callers to specify
      the memory allocation behavior. Thus iommu_alloc_resv_region() could also
      be available in critical contexts.
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      0251d010
    • Adam Borowski's avatar
      i2c: mlxbf: depend on ACPI; clean away ifdeffage · 65d78b8d
      Adam Borowski authored
      This fixes maybe_unused warnings/errors.
      
      According to a comment during device tree removal, only ACPI is supported,
      thus let's actually require it.
      
      Fixes: be18c5ed ("i2c: mlxbf: remove device tree support")
      Signed-off-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      65d78b8d
    • Alistair Popple's avatar
      nouveau: fix migrate_to_ram() for faulting page · 97061d44
      Alistair Popple authored
      Commit 16ce101d ("mm/memory.c: fix race when faulting a device private
      page") changed the migrate_to_ram() callback to take a reference on the
      device page to ensure it can't be freed while handling the fault. 
      Unfortunately the corresponding update to Nouveau to accommodate this
      change was inadvertently dropped from that patch causing GPU to CPU
      migration to fail so add it here.
      
      Link: https://lkml.kernel.org/r/20221019122934.866205-1-apopple@nvidia.com
      Fixes: 16ce101d ("mm/memory.c: fix race when faulting a device private page")
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      97061d44
    • Mel Gorman's avatar
      mm/huge_memory: do not clobber swp_entry_t during THP split · 71e2d666
      Mel Gorman authored
      The following has been observed when running stressng mmap since commit
      b653db77 ("mm: Clear page->private when splitting or migrating a page")
      
         watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
         CPU: 75 PID: 9546 Comm: stress-ng Tainted: G            E      6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
         Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
         RIP: 0010:xas_descend+0x28/0x80
         Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
         RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
         RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
         RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
         RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
         R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
         R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
         FS:  00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         Call Trace:
          <TASK>
          xas_load+0x3a/0x50
          __filemap_get_folio+0x80/0x370
          ? put_swap_page+0x163/0x360
          pagecache_get_page+0x13/0x90
          __try_to_reclaim_swap+0x50/0x190
          scan_swap_map_slots+0x31e/0x670
          get_swap_pages+0x226/0x3c0
          folio_alloc_swap+0x1cc/0x240
          add_to_swap+0x14/0x70
          shrink_page_list+0x968/0xbc0
          reclaim_page_list+0x70/0xf0
          reclaim_pages+0xdd/0x120
          madvise_cold_or_pageout_pte_range+0x814/0xf30
          walk_pgd_range+0x637/0xa30
          __walk_page_range+0x142/0x170
          walk_page_range+0x146/0x170
          madvise_pageout+0xb7/0x280
          ? asm_common_interrupt+0x22/0x40
          madvise_vma_behavior+0x3b7/0xac0
          ? find_vma+0x4a/0x70
          ? find_vma+0x64/0x70
          ? madvise_vma_anon_name+0x40/0x40
          madvise_walk_vmas+0xa6/0x130
          do_madvise+0x2f4/0x360
          __x64_sys_madvise+0x26/0x30
          do_syscall_64+0x5b/0x80
          ? do_syscall_64+0x67/0x80
          ? syscall_exit_to_user_mode+0x17/0x40
          ? do_syscall_64+0x67/0x80
          ? syscall_exit_to_user_mode+0x17/0x40
          ? do_syscall_64+0x67/0x80
          ? do_syscall_64+0x67/0x80
          ? common_interrupt+0x8b/0xa0
          entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The problem can be reproduced with the mmtests config
      config-workload-stressng-mmap.  It does not always happen and when it
      triggers is variable but it has happened on multiple machines.
      
      The intent of commit b653db77 patch was to avoid the case where
      PG_private is clear but folio->private is not-NULL.  However, THP tail
      pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
      stated in the documentation for struct folio.  This patch only clobbers
      page->private for tail pages if the head page was not in swapcache and
      warns once if page->private had an unexpected value.
      
      Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net
      Fixes: b653db77 ("mm: Clear page->private when splitting or migrating a page")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      71e2d666
    • Mike Kravetz's avatar
      hugetlb: fix memory leak associated with vma_lock structure · 612b8a31
      Mike Kravetz authored
      The hugetlb vma_lock structure hangs off the vm_private_data pointer of
      sharable hugetlb vmas.  The structure is vma specific and can not be
      shared between vmas.  At fork and various other times, vmas are duplicated
      via vm_area_dup().  When this happens, the pointer in the newly created
      vma must be cleared and the structure reallocated.  Two hugetlb specific
      routines deal with this hugetlb_dup_vma_private and hugetlb_vm_op_open. 
      Both routines are called for newly created vmas.  hugetlb_dup_vma_private
      would always clear the pointer and hugetlb_vm_op_open would allocate the
      new vms_lock structure.  This did not work in the case of this calling
      sequence pointed out in [1].
      
        move_vma
          copy_vma
            new_vma = vm_area_dup(vma);
            new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock.
          is_vm_hugetlb_page(vma)
            clear_vma_resv_huge_pages
              hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL
      
      When clearing hugetlb_dup_vma_private we actually leak the associated
      vma_lock structure.
      
      The vma_lock structure contains a pointer to the associated vma.  This
      information can be used in hugetlb_dup_vma_private and hugetlb_vm_op_open
      to ensure we only clear the vm_private_data of newly created (copied)
      vmas.  In such cases, the vma->vma_lock->vma field will not point to the
      vma.
      
      Update hugetlb_dup_vma_private and hugetlb_vm_op_open to not clear
      vm_private_data if vma->vma_lock->vma == vma.  Also, log a warning if
      hugetlb_vm_op_open ever encounters the case where vma_lock has already
      been correctly allocated for the vma.
      
      [1] https://lore.kernel.org/linux-mm/5154292a-4c55-28cd-0935-82441e512fc3@huawei.com/
      
      Link: https://lkml.kernel.org/r/20221019201957.34607-1-mike.kravetz@oracle.com
      Fixes: 131a79b4 ("hugetlb: fix vma lock handling during split vma and range unmapping")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      612b8a31