1. 20 Oct, 2022 8 commits
  2. 19 Oct, 2022 8 commits
  3. 18 Oct, 2022 1 commit
  4. 16 Oct, 2022 1 commit
  5. 12 Oct, 2022 6 commits
    • Jens Axboe's avatar
      Merge tag 'nvme-6.1-2022-10-12' of git://git.infradead.org/nvme into block-6.1 · 3bc429c1
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for Linux 6.1
      
       - add NVME_QUIRK_BOGUS_NID for Lexar NM760 (Abhijit)
       - avoid the deepest sleep state on ZHITAI TiPro5000 SSDs (Xi Ruoyao)
       - fix possible hang caused during ctrl deletion (Sagi Grimberg)
       - fix possible hang in live ns resize with ANA access (Sagi Grimberg)"
      
      * tag 'nvme-6.1-2022-10-12' of git://git.infradead.org/nvme:
        nvme-multipath: fix possible hang in live ns resize with ANA access
        nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
        nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760
        nvme-tcp: fix possible hang caused during ctrl deletion
        nvme-rdma: fix possible hang caused during ctrl deletion
      3bc429c1
    • Sagi Grimberg's avatar
      nvme-multipath: fix possible hang in live ns resize with ANA access · 72e3b888
      Sagi Grimberg authored
      When we revalidate paths as part of ns size change (as of commit
      e7d65803), it is possible that during the path revalidation, the
      only paths that is IO capable (i.e. optimized/non-optimized) are the
      ones that ns resize was not yet informed to the host, which will cause
      inflight requests to be requeued (as we have available paths but none
      are IO capable). These requests on the requeue list are waiting for
      someone to resubmit them at some point.
      
      The IO capable paths will eventually notify the ns resize change to the
      host, but there is nothing that will kick the requeue list to resubmit
      the queued requests.
      
      Fix this by always kicking the requeue list, and if no IO capable path
      exists, these requests will be queued again.
      
      A typical log that indicates that IOs are requeued:
      --
      nvme nvme1: creating 4 I/O queues.
      nvme nvme1: new ctrl: "testnqn1"
      nvme nvme2: creating 4 I/O queues.
      nvme nvme2: mapped 4/0/0 default/read/poll queues.
      nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
      nvme nvme1: rescanning namespaces.
      nvme1n1: detected capacity change from 2097152 to 4194304
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      block nvme1n1: no usable path - requeuing I/O
      nvme nvme2: rescanning namespaces.
      --
      Reported-by: default avatarYogev Cohen <yogev@lightbitslabs.com>
      Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan")
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Cc: <stable@vger.kernel.org> # v5.15+
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      72e3b888
    • Xi Ruoyao's avatar
      nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs · d5d3c100
      Xi Ruoyao authored
      ZHITAI TiPro5000 SSDs has the same APST sleep problem as its cousin,
      TiPro7000.  The quirk for TiPro7000 has been added in
      commit 6b961bce ("nvme-pci: avoid the deepest sleep state on
      ZHITAI TiPro7000 SSDs"), use the same quirk for TiPro5000.
      
      The ASPT data from "nvme id-ctrl /dev/nvme1":
      
      vid       : 0x1e49
      ssvid     : 0x1e49
      sn        : ZTA21T0KA2227304LM
      mn        : ZHITAI TiPlus5000 1TB
      fr        : ZTA09139
      [...]
      ps    0 : mp:6.50W operational enlat:0 exlat:0 rrt:0 rrl:0
               rwt:0 rwl:0 idle_power:- active_power:-
      ps    1 : mp:5.80W operational enlat:0 exlat:0 rrt:1 rrl:1
               rwt:1 rwl:1 idle_power:- active_power:-
      ps    2 : mp:3.60W operational enlat:0 exlat:0 rrt:2 rrl:2
               rwt:2 rwl:2 idle_power:- active_power:-
      ps    3 : mp:0.0500W non-operational enlat:5000 exlat:10000 rrt:3 rrl:3
               rwt:3 rwl:3 idle_power:- active_power:-
      ps    4 : mp:0.0025W non-operational enlat:8000 exlat:45000 rrt:4 rrl:4
               rwt:4 rwl:4 idle_power:- active_power:-
      Reported-and-tested-by: default avatarChang Feng <flukehn@gmail.com>
      Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      d5d3c100
    • Abhijit's avatar
      nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760 · 80b26240
      Abhijit authored
      Add a quirk to fix Lexar NM760 SSD drives reporting duplicate nsids.
      Signed-off-by: default avatarAbhijit <abhijit@abhijittomar.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      80b26240
    • Sagi Grimberg's avatar
      nvme-tcp: fix possible hang caused during ctrl deletion · c4abd875
      Sagi Grimberg authored
      When we delete a controller, we execute the following:
      1. nvme_stop_ctrl() - stop some work elements that may be
      	inflight or scheduled (specifically also .stop_ctrl
      	which cancels ctrl error recovery work)
      2. nvme_remove_namespaces() - which first flushes scan_work
      	to avoid competing ns addition/removal
      3. continue to teardown the controller
      
      However, if err_work was scheduled to run in (1), it is designed to
      cancel any inflight I/O, particularly I/O that is originating from ns
      scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
      prevent forward progress of (2) as ns scanning is blocking on I/O
      (that will never be cancelled).
      
      The race is:
      1. transport layer error observed -> err_work is scheduled
      2. scan_work executes, discovers ns, generate I/O to it
      3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
         - err_work never executed
      4. nvme_remove_namespaces() -> flush_work(scan_work)
      --> deadlock, because scan_work is blocked on I/O that was supposed
      to be cancelled by err_work, but was cancelled before executing (see
      stack trace [1]).
      
      Fix this by flushing err_work instead of cancelling it, to force it
      to execute and cancel all inflight I/O.
      
      [1]:
      --
      Call Trace:
       <TASK>
       __schedule+0x390/0x910
       ? scan_shadow_nodes+0x40/0x40
       schedule+0x55/0xe0
       io_schedule+0x16/0x40
       do_read_cache_page+0x55d/0x850
       ? __page_cache_alloc+0x90/0x90
       read_cache_page+0x12/0x20
       read_part_sector+0x3f/0x110
       amiga_partition+0x3d/0x3e0
       ? osf_partition+0x33/0x220
       ? put_partition+0x90/0x90
       bdev_disk_changed+0x1fe/0x4d0
       blkdev_get_whole+0x7b/0x90
       blkdev_get_by_dev+0xda/0x2d0
       device_add_disk+0x356/0x3b0
       nvme_mpath_set_live+0x13c/0x1a0 [nvme_core]
       ? nvme_parse_ana_log+0xae/0x1a0 [nvme_core]
       nvme_update_ns_ana_state+0x3a/0x40 [nvme_core]
       nvme_mpath_add_disk+0x120/0x160 [nvme_core]
       nvme_alloc_ns+0x594/0xa00 [nvme_core]
       nvme_validate_or_alloc_ns+0xb9/0x1a0 [nvme_core]
       ? __nvme_submit_sync_cmd+0x1d2/0x210 [nvme_core]
       nvme_scan_work+0x281/0x410 [nvme_core]
       process_one_work+0x1be/0x380
       worker_thread+0x37/0x3b0
       ? process_one_work+0x380/0x380
       kthread+0x12d/0x150
       ? set_kthread_struct+0x50/0x50
       ret_from_fork+0x1f/0x30
       </TASK>
      INFO: task nvme:6725 blocked for more than 491 seconds.
            Not tainted 5.15.65-f0.el7.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:nvme            state:D
       stack:    0 pid: 6725 ppid:  1761 flags:0x00004000
      Call Trace:
       <TASK>
       __schedule+0x390/0x910
       ? sched_clock+0x9/0x10
       schedule+0x55/0xe0
       schedule_timeout+0x24b/0x2e0
       ? try_to_wake_up+0x358/0x510
       ? finish_task_switch+0x88/0x2c0
       wait_for_completion+0xa5/0x110
       __flush_work+0x144/0x210
       ? worker_attach_to_pool+0xc0/0xc0
       flush_work+0x10/0x20
       nvme_remove_namespaces+0x41/0xf0 [nvme_core]
       nvme_do_delete_ctrl+0x47/0x66 [nvme_core]
       nvme_sysfs_delete.cold.96+0x8/0xd [nvme_core]
       dev_attr_store+0x14/0x30
       sysfs_kf_write+0x38/0x50
       kernfs_fop_write_iter+0x146/0x1d0
       new_sync_write+0x114/0x1b0
       ? intel_pmu_handle_irq+0xe0/0x420
       vfs_write+0x18d/0x270
       ksys_write+0x61/0xe0
       __x64_sys_write+0x1a/0x20
       do_syscall_64+0x37/0x90
       entry_SYSCALL_64_after_hwframe+0x61/0xcb
      --
      
      Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
      Reported-by: default avatarJonathan Nicklin <jnicklin@blockbridge.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Tested-by: default avatarJonathan Nicklin <jnicklin@blockbridge.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      c4abd875
    • Sagi Grimberg's avatar
      nvme-rdma: fix possible hang caused during ctrl deletion · a1ae8d4d
      Sagi Grimberg authored
      When we delete a controller, we execute the following:
      1. nvme_stop_ctrl() - stop some work elements that may be
              inflight or scheduled (specifically also .stop_ctrl
              which cancels ctrl error recovery work)
      2. nvme_remove_namespaces() - which first flushes scan_work
              to avoid competing ns addition/removal
      3. continue to teardown the controller
      
      However, if err_work was scheduled to run in (1), it is designed to
      cancel any inflight I/O, particularly I/O that is originating from ns
      scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
      prevent forward progress of (2) as ns scanning is blocking on I/O
      (that will never be cancelled).
      
      The race is:
      1. transport layer error observed -> err_work is scheduled
      2. scan_work executes, discovers ns, generate I/O to it
      3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
         - err_work never executed
      4. nvme_remove_namespaces() -> flush_work(scan_work)
      --> deadlock, because scan_work is blocked on I/O that was supposed
      to be cancelled by err_work, but was cancelled before executing.
      
      Fix this by flushing err_work instead of cancelling it, to force it
      to execute and cancel all inflight I/O.
      
      Fixes: b435ecea ("nvme: Add .stop_ctrl to nvme ctrl ops")
      Fixes: f6c8e432 ("nvme: flush namespace scanning work just before removing namespaces")
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      a1ae8d4d
  6. 10 Oct, 2022 3 commits
  7. 09 Oct, 2022 11 commits
    • Linus Torvalds's avatar
      Merge tag 'ucount-rlimits-cleanups-for-v5.19' of... · 493ffd66
      Linus Torvalds authored
      Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull ucounts update from Eric Biederman:
       "Split rlimit and ucount values and max values
      
        After the ucount rlimit code was merged a bunch of small but
        siginificant bugs were found and fixed. At the time it was realized
        that part of the problem was that while the ucount rlimits were very
        similar to the oridinary ucounts (in being nested counts with limits)
        the semantics were slightly different and the code would be less error
        prone if there was less sharing.
      
        This is the long awaited cleanup that should hopefully keep things
        more comprehensible and less error prone for whoever needs to touch
        that code next"
      
      * tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        ucounts: Split rlimit and ucount values and max values
      493ffd66
    • Linus Torvalds's avatar
      Merge tag 'signal-for-v5.20' of... · e572410e
      Linus Torvalds authored
      Merge tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull ptrace update from Eric Biederman:
       "ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
      
        Recently I had a conversation where it was pointed out to me that
        SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
        difficult for a tracer to handle.
      
        Keeping SIGKILL working after the process has been killed is pain from
        an implementation point of view.
      
        So since the debuggers don't want this behavior let's see if we can
        remove this wart for the userspace API
      
        If a regression is detected it should only need to be the last change
        that is the reverted. The other two are just general cleanups that
        make the last patch simpler"
      
      * tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        signal: Drop signals received after a fatal signal has been processed
        signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
        signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
      e572410e
    • Linus Torvalds's avatar
      Merge tag 'retire_mq_sysctls-for-v5.19' of... · 86fb9c53
      Linus Torvalds authored
      Merge tag 'retire_mq_sysctls-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull mqueue fix from Eric Biederman:
       "A fix for an unlikely but possible memory leak"
      
      * tag 'retire_mq_sysctls-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        ipc: mqueue: fix possible memory leak in init_mqueue_fs()
      86fb9c53
    • Linus Torvalds's avatar
      Merge tag 'interrupting_kthread_stop-for-v5.20' of... · c71370bd
      Linus Torvalds authored
      Merge tag 'interrupting_kthread_stop-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull kthread update from Eric Biederman:
       "Break out of wait loops on kthread_stop()
      
        This is a small tweak to kthread_stop so it breaks out of
        interruptible waits, that don't explicitly test for kthread_stop.
      
        These interruptible waits occassionaly occur in kernel threads do to
        code sharing"
      
      * tag 'interrupting_kthread_stop-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        signal: break out of wait loops on kthread_stop()
      c71370bd
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 4899a36f
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
      
       - Remove our now never-true definitions for pgd_huge() and p4d_leaf().
      
       - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
      
       - Add support for syscall wrappers.
      
       - Add support for KFENCE on 64-bit.
      
       - Update 64-bit HV KVM to use the new guest state entry/exit accounting
         API.
      
       - Support execute-only memory when using the Radix MMU (P9 or later).
      
       - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
      
       - Updates to our linker script to move more data into read-only
         sections.
      
       - Allow the VDSO to be randomised on 32-bit.
      
       - Many other small features and fixes.
      
      Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
      Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
      Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
      Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
      Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
      Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
      Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
      Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
      Yongjun.
      
      * tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
        KVM: PPC: Book3S HV: Fix stack frame regs marker
        powerpc: Don't add __powerpc_ prefix to syscall entry points
        powerpc/64s/interrupt: Fix stack frame regs marker
        powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
        powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
        powerpc/pseries: Add firmware details to the hardware description
        powerpc/powernv: Add opal details to the hardware description
        powerpc: Add device-tree model to the hardware description
        powerpc/64: Add logical PVR to the hardware description
        powerpc: Add PVR & CPU name to hardware description
        powerpc: Add hardware description string
        powerpc/configs: Enable PPC_UV in powernv_defconfig
        powerpc/configs: Update config files for removed/renamed symbols
        powerpc/mm: Fix UBSAN warning reported on hugetlb
        powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
        powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
        powerpc: Drops STABS_DEBUG from linker scripts
        powerpc/64s: Remove lost/old comment
        powerpc/64s: Remove old STAB comment
        powerpc: remove orphan systbl_chk.sh
        ...
      4899a36f
    • Linus Torvalds's avatar
      Merge tag 's390-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 03785a69
      Linus Torvalds authored
      Pull s390 updates from Vasily Gorbik:
      
       - Make use of the IBM z16 processor activity instrumentation facility
         extension to count neural network processor assist operations: add a
         new PMU device driver so that perf can make use of this.
      
       - Rework memcpy_real() to avoid DAT-off mode.
      
       - Rework absolute lowcore access code.
      
       - Various small fixes and improvements all over the code.
      
      * tag 's390-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: remove unused bus_next field from struct zpci_dev
        s390/cio: remove unused ccw_device_force_console() declaration
        s390/pai: Add support for PAI Extension 1 NNPA counters
        s390/mm: fix no previous prototype warnings in maccess.c
        s390/mm: uninline copy_oldmem_kernel() function
        s390/mm,ptdump: add real memory copy page markers
        s390/mm: rework memcpy_real() to avoid DAT-off mode
        s390/dump: save IPL CPU registers once DAT is available
        s390/pci: convert high_memory to physical address
        s390/smp,ptdump: add absolute lowcore markers
        s390/smp: rework absolute lowcore access
        s390/smp: call smp_reinit_ipl_cpu() before scheduler is available
        s390/ptdump: add missing amode31 markers
        s390/mm: split lowcore pages with set_memory_4k()
        s390/mm: remove unused access parameter from do_fault_error()
        s390/delay: sync comment within __delay() with reality
        s390: move from strlcpy with unused retval to strscpy
      03785a69
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 2e64066d
      Linus Torvalds authored
      Pull RISC-V updates from Palmer Dabbelt:
      
       - Improvements to the CPU topology subsystem, which fix some issues
         where RISC-V would report bad topology information.
      
       - The default NR_CPUS has increased to XLEN, and the maximum
         configurable value is 512.
      
       - The CD-ROM filesystems have been enabled in the defconfig.
      
       - Support for THP_SWAP has been added for rv64 systems.
      
      There are also a handful of cleanups and fixes throughout the tree.
      
      * tag 'riscv-for-linus-6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: enable THP_SWAP for RV64
        RISC-V: Print SSTC in canonical order
        riscv: compat: s/failed/unsupported if compat mode isn't supported
        RISC-V: Increase range and default value of NR_CPUS
        cpuidle: riscv-sbi: Fix CPU_PM_CPU_IDLE_ENTER_xyz() macro usage
        perf: RISC-V: throttle perf events
        perf: RISC-V: exclude invalid pmu counters from SBI calls
        riscv: enable CD-ROM file systems in defconfig
        riscv: topology: fix default topology reporting
        arm64: topology: move store_cpu_topology() to shared code
      2e64066d
    • Linus Torvalds's avatar
      Merge tag 'microblaze-v6.1' of git://git.monstr.eu/linux-2.6-microblaze · 57c92724
      Linus Torvalds authored
      Pull microblaze updates from Michal Simek:
       "This adds architecture support for error injection which can be done
        only via local memory (BRAM) with enabling path for recovery after
        reset.
      
        These patches targets Triple Modular Redundacy (TMR) configuration
        where 3 Microblazes are running in parallel with monitoring logic.
      
        When an error happens (or is injected) system goes to break handler
        with full CPU reset and system recovery back to origin context. More
        information can be found at [1]"
      
      Link: https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/tmr/v1_0/pg268-tmr.pdf [1]
      
      * tag 'microblaze-v6.1' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Add support for error injection
        microblaze: Add custom break vector handler for mb manager
        microblaze: Add xmb_manager_register function
      57c92724
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · ef688f8b
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "The first batch of KVM patches, mostly covering x86.
      
        ARM:
      
         - Account stage2 page table allocations in memory stats
      
        x86:
      
         - Account EPT/NPT arm64 page table allocations in memory stats
      
         - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
           accesses
      
         - Drop eVMCS controls filtering for KVM on Hyper-V, all known
           versions of Hyper-V now support eVMCS fields associated with
           features that are enumerated to the guest
      
         - Use KVM's sanitized VMCS config as the basis for the values of
           nested VMX capabilities MSRs
      
         - A myriad event/exception fixes and cleanups. Most notably, pending
           exceptions morph into VM-Exits earlier, as soon as the exception is
           queued, instead of waiting until the next vmentry. This fixed a
           longstanding issue where the exceptions would incorrecly become
           double-faults instead of triggering a vmexit; the common case of
           page-fault vmexits had a special workaround, but now it's fixed for
           good
      
         - A handful of fixes for memory leaks in error paths
      
         - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow
      
         - Never write to memory from non-sleepable kvm_vcpu_check_block()
      
         - Selftests refinements and cleanups
      
         - Misc typo cleanups
      
        Generic:
      
         - remove KVM_REQ_UNHALT"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
        KVM: remove KVM_REQ_UNHALT
        KVM: mips, x86: do not rely on KVM_REQ_UNHALT
        KVM: x86: never write to memory from kvm_vcpu_check_block()
        KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
        KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
        KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
        KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
        KVM: x86: lapic does not have to process INIT if it is blocked
        KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
        KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
        KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
        KVM: x86: make vendor code check for all nested events
        mailmap: Update Oliver's email address
        KVM: x86: Allow force_emulation_prefix to be written without a reload
        KVM: selftests: Add an x86-only test to verify nested exception queueing
        KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
        KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
        KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
        KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
        KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
        ...
      ef688f8b
    • Linus Torvalds's avatar
      Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 0e470763
      Linus Torvalds authored
      Pull EFI updates from Ard Biesheuvel:
       "A bit more going on than usual in the EFI subsystem. The main driver
        for this has been the introduction of the LoonArch architecture last
        cycle, which inspired some cleanup and refactoring of the EFI code.
        Another driver for EFI changes this cycle and in the future is
        confidential compute.
      
        The LoongArch architecture does not use either struct bootparams or DT
        natively [yet], and so passing information between the EFI stub and
        the core kernel using either of those is undesirable. And in general,
        overloading DT has been a source of issues on arm64, so using DT for
        this on new architectures is a to avoid for the time being (even if we
        might converge on something DT based for non-x86 architectures in the
        future). For this reason, in addition to the patch that enables EFI
        boot for LoongArch, there are a number of refactoring patches applied
        on top of which separate the DT bits from the generic EFI stub bits.
        These changes are on a separate topich branch that has been shared
        with the LoongArch maintainers, who will include it in their pull
        request as well. This is not ideal, but the best way to manage the
        conflicts without stalling LoongArch for another cycle.
      
        Another development inspired by LoongArch is the newly added support
        for EFI based decompressors. Instead of adding yet another
        arch-specific incarnation of this pattern for LoongArch, we are
        introducing an EFI app based on the existing EFI libstub
        infrastructure that encapulates the decompression code we use on other
        architectures, but in a way that is fully generic. This has been
        developed and tested in collaboration with distro and systemd folks,
        who are eager to start using this for systemd-boot and also for arm64
        secure boot on Fedora. Note that the EFI zimage files this introduces
        can also be decompressed by non-EFI bootloaders if needed, as the
        image header describes the location of the payload inside the image,
        and the type of compression that was used. (Note that Fedora's arm64
        GRUB is buggy [0] so you'll need a recent version or switch to
        systemd-boot in order to use this.)
      
        Finally, we are adding TPM measurement of the kernel command line
        provided by EFI. There is an oversight in the TCG spec which results
        in a blind spot for command line arguments passed to loaded images,
        which means that either the loader or the stub needs to take the
        measurement. Given the combinatorial explosion I am anticipating when
        it comes to firmware/bootloader stacks and firmware based attestation
        protocols (SEV-SNP, TDX, DICE, DRTM), it is good to set a baseline now
        when it comes to EFI measured boot, which is that the kernel measures
        the initrd and command line. Intermediate loaders can measure
        additional assets if needed, but with the baseline in place, we can
        deploy measured boot in a meaningful way even if you boot into Linux
        straight from the EFI firmware.
      
        Summary:
      
         - implement EFI boot support for LoongArch
      
         - implement generic EFI compressed boot support for arm64, RISC-V and
           LoongArch, none of which implement a decompressor today
      
         - measure the kernel command line into the TPM if measured boot is in
           effect
      
         - refactor the EFI stub code in order to isolate DT dependencies for
           architectures other than x86
      
         - avoid calling SetVirtualAddressMap() on arm64 if the configured
           size of the VA space guarantees that doing so is unnecessary
      
         - move some ARM specific code out of the generic EFI source files
      
         - unmap kernel code from the x86 mixed mode 1:1 page tables"
      
      * tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
        efi/arm64: libstub: avoid SetVirtualAddressMap() when possible
        efi: zboot: create MemoryMapped() device path for the parent if needed
        efi: libstub: fix up the last remaining open coded boot service call
        efi/arm: libstub: move ARM specific code out of generic routines
        efi/libstub: measure EFI LoadOptions
        efi/libstub: refactor the initrd measuring functions
        efi/loongarch: libstub: remove dependency on flattened DT
        efi: libstub: install boot-time memory map as config table
        efi: libstub: remove DT dependency from generic stub
        efi: libstub: unify initrd loading between architectures
        efi: libstub: remove pointless goto kludge
        efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmap
        efi: libstub: avoid efi_get_memory_map() for allocating the virt map
        efi: libstub: drop pointless get_memory_map() call
        efi: libstub: fix type confusion for load_options_size
        arm64: efi: enable generic EFI compressed boot
        loongarch: efi: enable generic EFI compressed boot
        riscv: efi: enable generic EFI compressed boot
        efi/libstub: implement generic EFI zboot
        efi/libstub: move efi_system_table global var into separate object
        ...
      0e470763
    • Yu Kuai's avatar
      blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() · 285febab
      Yu Kuai authored
      commit 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is
      initialized") moves wbt_set_write_cache() before rq_qos_add(), which
      is wrong because wbt_rq_qos() is still NULL.
      
      Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc'
      directly. Noted that this patch also remove the redundant setting of
      'rab->wc'.
      
      Fixes: 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.comSigned-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      285febab
  8. 08 Oct, 2022 2 commits
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v6.1' of git://git.linaro.org/landing-teams/working/fujitsu/integration · a6afa419
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
      
       - apple: implement poll and flush callbacks
      
       - qcom: fix clocks for IPQ6018 and IPQ8074 irq handler as not-a-thread
      
       - microchip: split reg-space into two
      
       - imx: RST channel fix
      
       - bcm: fix dma_map_sg error handling
      
       - misc: spelling fix in pcc driver
      
      * tag 'mailbox-v6.1' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: qcom-ipcc: flag IRQ NO_THREAD
        mailbox: pcc: Fix spelling mistake "Plaform" -> "Platform"
        mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
        mailbox: qcom-apcs-ipc: add IPQ8074 APSS clock support
        dt-bindings: mailbox: qcom: correct clocks for IPQ6018 and IPQ8074
        dt-bindings: mailbox: qcom: set correct #clock-cells
        mailbox: mpfs: account for mbox offsets while sending
        mailbox: mpfs: fix handling of the reg property
        dt-bindings: mailbox: fix the mpfs' reg property
        mailbox: imx: fix RST channel support
        mailbox: apple: Implement poll_data() operation
        mailbox: apple: Implement flush() operation
      a6afa419
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · bdc753c7
      Linus Torvalds authored
      Pull clk updates from Stephen Boyd:
       "We have some late breaking reports that a patch series to rework clk
        rate range support broke boot on some devices, so I've left that
        branch out of this. Hopefully we can get to that next week, or punt on
        it and let it bake another cycle. That means we don't really have any
        changes to the core framework this time around besides a few typo
        fixes. Instead this is all clk driver updates and fixes.
      
        The usual suspects are here (again), with Qualcomm dominating the
        diffstat. We look to have gained support for quite a few new Qualcomm
        SoCs and Dmitry worked on updating many of the existing Qualcomm
        drivers to use clk_parent_data. After that we have MediaTek drivers
        getting some much needed updates, in particular to support GPU DVFS.
        There are also quite a few Samsung clk driver patches, but that's
        mostly because there was a maintainer change and so last release we
        missed some of those patches.
      
        Overall things look normal, but I'm slowly reviewing core framework
        code nowadays and that shows given the rate range patches had to be
        yanked last minute. Let's hope this situation changes soon.
      
        New Drivers:
         - Support for Renesas VersaClock7 clock generator family
         - Add Spreadtrum UMS512 SoC clk support
         - New clock drivers for MediaTek Helio X10 MT6795
         - Display clks for Qualcomm SM6115, SM8450
         - GPU clks for Qualcomm SC8280XP
         - Qualcomm MSM8909 and SM6375 global and SMD RPM clk drivers
      
        Deleted Drivers:
         - Remove DaVinci DM644x and DM646x clk driver support
      
        Updates:
         - Convert Baikal-T1 CCU driver to platform driver
         - Split reset support out of primary Baikal-T1 CCU driver
         - Add some missing clks required for RPiVid Video Decoder on
           RaspberryPi
         - Mark PLLC critical on bcm2835
         - More devm helpers for fixed rate registration
         - Various PXA168 clk driver fixes
         - Add resets for MediaTek MT8195 PCIe and USB
         - Miscellaneous of_node_put() fixes
         - Nuke dt-bindings/clk path (again) by moving headers to
           dt-bindings/clock
         - Convert gpio-clk-gate binding to YAML
         - Various fixes to AMD/Xilinx Zynqmp clk driver
         - Graduate AMD/Xilinx "clocking wizard" driver from staging
         - Add missing DPI1_HDMI clock in MT8195 VDOSYS1
         - Clock driver changes to support GPU DVFS on MT8183, MT8192, MT8195
         - Fix GPU clock topology on MT8195
         - Propogate rate changes from GPU clock gate up the tree
         - Clock mux notifiers for GPU-related PLLs
         - Conversion of more "simple" drivers to mtk_clk_simple_probe()
         - Hook up mtk_clk_simple_remove() for "simple" MT8192 clock drivers
         - Fixes to previous |struct clk| to |struct clk_hw| conversion on
           MediaTek
         - Shrink MT8192 clock driver by deduplicating clock parent lists
         - Change order between 'sim_enet_root_clk' and 'enet_qos_root_clk'
           clocks for i.MX8MP
         - Drop unnecessary newline in i.MX8MM dt-bindings
         - Add more MU1 and SAI clocks dt-bindings Ids
         - Introduce slice busy bit check for i.MX93 composite clock
         - Introduce white list bit check for i.MX93 composite clock
         - Add new i.MX93 clock gate
         - Add MU1 and MU2 clocks to i.MX93 clock provider
         - Add SAI IPG clocks to i.MX93 clock provider
         - add generic clocks for U(S)ART available on SAMA5D2 SoCs
         - reset controller support for Polarfire clocks
         - .round_rate and .set rate support for clk-mpfs
         - code cleanup for clk-mpfs
         - PLL support for PolarFire SoC's Clock Conditioning Circuitry
         - Add watchdog, I2C, pin control/GPIO, and Ethernet clocks on R-Car
           V4H
         - Add SDHI, Timer (CMT/TMU), and SPI (MSIOF) clocks on R-Car S4-8
         - Add I2C clocks and resets on RZ/V2M
         - Document clock support for the RZ/Five SoC
         - mux-variant clock using the table variant to select parents
         - clock controller for the rv1126 soc
         - conversion of rk3128 to yaml and relicensing of the yaml bindings
           to gpl2+MIT (following dt-binding guildelines)
         - Exynos7885: add FSYS, TREX and MFC clock controllers
         - Exynos850: add IS and AUD (audio) clock controllers with bindings
         - ExynosAutov9: add FSYS clock controllers with bindings
         - ExynosAutov9: correct clock IDs in bindings of Peric 0 and 1 clock
           controllers, due to duplicated entries. This is an acceptable ABI
           break: recently developed/added platform so without legacies, acked
           by known users/developers
         - ExynosAutov9: add few missing Peric 0/1 gates
         - ExynosAutov9: correct register offsets of few Peric 0/1 clocks
         - Minor code improvements (use of_device_get_match_data() helper,
           code style)
         - Add Krzysztof Kozlowski as co-maintainer of Samsung SoC clocks, as
           he already maintainers that architecture/platform
         - Keep Qualcomm GDSCs enabled when PWRSTS_RET flag is there, solving
           retention issues during suspend of USB on Qualcomm sc7180/sc7280
           and SC8280XP
         - Qualcomm SM6115 and QCM2260 are moved to reuse PLL configuration
         - Qualcomm SDM660 SDCC1 moved to floor clk ops
         - Support for the APCS PLLs for Qualcomm IPQ8064, IPQ8074 and IPQ6018
           was added/fixed
         - The Qualcomm MSM8996 CPU clocks are updated with support for ACD
         - Support for Qualcomm SDM670 GCC and RPMh clks was added
         - Transition to parent_data, parent_hws and use of ARRAY_SIZE() for
           num_parents was done for many Qualcomm SoCs
         - Support for per-reset defined delay on Qualcomm was introduced"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (283 commits)
        clk: qcom: gcc-sm6375: Ensure unsigned long type
        clk: qcom: gcc-sm6375: Remove unused variables
        clk: qcom: kpss-xcc: convert to parent data API
        clk: introduce (devm_)hw_register_mux_parent_data_table API
        clk: allow building lan966x as a module
        clk: clk-xgene: simplify if-if to if-else
        clk: ast2600: BCLK comes from EPLL
        clk: clocking-wizard: Depend on HAS_IOMEM
        clk: clocking-wizard: Use dev_err_probe() helper
        clk: nxp: fix typo in comment
        clk: pxa: add a check for the return value of kzalloc()
        clk: vc5: Add support for IDT/Renesas VersaClock 5P49V6975
        dt-bindings: clock: vc5: Add 5P49V6975
        clk: mvebu: armada-37xx-tbg: Remove the unneeded result variable
        clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
        clk: Renesas versaclock7 ccf device driver
        dt-bindings: Renesas versaclock7 device tree bindings
        clk: ti: Balance of_node_get() calls for of_find_node_by_name()
        clk: imx: scu: fix memleak on platform_device_add() fails
        clk: vc5: Use regmap_{set,clear}_bits() where appropriate
        ...
      bdc753c7