1. 30 Jul, 2021 1 commit
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha · cade08a5
      Linus Torvalds authored
      Pull alpha updates from Matt Turner:
       "They're mostly small janitorial fixes but there's also more important
        ones:
      
         - drop the alpha-specific x86 binary loader (David Hildenbrand)
      
         - regression fix for at least Marvel platforms (Mike Rapoport)
      
         - fix for a scary-looking typo (Zheng Yongjun)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
        alpha: register early reserved memory in memblock
        alpha: fix spelling mistakes
        alpha: Remove space between * and parameter name
        alpha: fp_emul: avoid init/cleanup_module names
        alpha: Add syscall_get_return_value()
        binfmt: remove support for em86 (alpha only)
        alpha: fix typos in a comment
        alpha: defconfig: add necessary configs for boot testing
        alpha: Send stop IPI to send to online CPUs
        alpha: convert comma to semicolon
        alpha: remove undef inline in compiler.h
        alpha: Kconfig: Replace HTTP links with HTTPS ones
        alpha: __udiv_qrnnd should be exported
      cade08a5
  2. 29 Jul, 2021 3 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 7e96bf47
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM:
      
         - Fix MTE shared page detection
      
         - Enable selftest's use of PMU registers when asked to
      
        s390:
      
         - restore 5.13 debugfs names
      
        x86:
      
         - fix sizes for vcpu-id indexed arrays
      
         - fixes for AMD virtualized LAPIC (AVIC)
      
         - other small bugfixes
      
        Generic:
      
         - access tracking performance test
      
         - dirty_log_perf_test command line parsing fix
      
         - Fix selftest use of obsolete pthread_yield() in favour of
           sched_yield()
      
         - use cpu_relax when halt polling
      
         - fixed missing KVM_CLEAR_DIRTY_LOG compat ioctl"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: add missing compat KVM_CLEAR_DIRTY_LOG
        KVM: use cpu_relax when halt polling
        KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl
        KVM: SVM: tweak warning about enabled AVIC on nested entry
        KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated
        KVM: s390: restore old debugfs names
        KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized
        KVM: selftests: Introduce access_tracking_perf_test
        KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing
        x86/kvm: fix vcpu-id indexed array sizes
        KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access
        docs: virt: kvm: api.rst: replace some characters
        KVM: Documentation: Fix KVM_CAP_ENFORCE_PV_FEATURE_CPUID name
        KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state()
        KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state()
        KVM: arm64: selftests: get-reg-list: actually enable pmu regs in pmu sublist
        KVM: selftests: change pthread_yield to sched_yield
        KVM: arm64: Fix detection of shared VMAs on guest fault
      7e96bf47
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 2b99c470
      Linus Torvalds authored
      Pull m68knommu fix from Greg Ungerer:
       "A single compile time fix"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k/coldfire: change pll var. to clk_pll
      2b99c470
    • Mike Rapoport's avatar
      alpha: register early reserved memory in memblock · 640b7ea5
      Mike Rapoport authored
      The memory reserved by console/PALcode or non-volatile memory is not added
      to memblock.memory.
      
      Since commit fa3354e4 (mm: free_area_init: use maximal zone PFNs rather
      than zone sizes) the initialization of the memory map relies on the
      accuracy of memblock.memory to properly calculate zone sizes. The holes in
      memblock.memory caused by absent regions reserved by the firmware cause
      incorrect initialization of struct pages which leads to BUG() during the
      initial page freeing:
      
      BUG: Bad page state in process swapper  pfn:2ffc53
      page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0
      flags: 0x0()
      raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      page dumped because: nonzero mapcount
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4-dirty #26
             fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0
             fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0
             0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618
             fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80
             fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80
             fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0
      Trace:
      [<fffffc00011cd148>] bad_page+0x168/0x1b0
      [<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290
      [<fffffc00011d1abc>] free_unref_page+0x2c/0xa0
      [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
      [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
      [<fffffc000101001c>] _stext+0x1c/0x20
      
      Fix this by registering the reserved ranges in memblock.memory.
      
      Link: https://lore.kernel.org/lkml/20210726192311.uffqnanxw3ac5wwi@ivybridge
      Fixes: fa3354e4 ("mm: free_area_init: use maximal zone PFNs rather than zone sizes")
      Reported-by: default avatarMatt Turner <mattst88@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      640b7ea5
  3. 28 Jul, 2021 6 commits
  4. 27 Jul, 2021 13 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 7d549995
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "Nothing very exciting here, mainly just a bunch of irdma fixes. irdma
        is a new driver this cycle so it to be expected.
      
         - Many more irdma fixups from bots/etc
      
         - bnxt_re regression in their counters from a FW upgrade
      
         - User triggerable memory leak in rxe"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/irdma: Change returned type of irdma_setup_virt_qp to void
        RDMA/irdma: Change the returned type of irdma_set_hw_rsrc to void
        RDMA/irdma: change the returned type of irdma_sc_repost_aeq_entries to void
        RDMA/irdma: Check vsi pointer before using it
        RDMA/rxe: Fix memory leak in error path code
        RDMA/irdma: Change the returned type to void
        RDMA/irdma: Make spdxcheck.py happy
        RDMA/irdma: Fix unused variable total_size warning
        RDMA/bnxt_re: Fix stats counters
      7d549995
    • Linus Torvalds's avatar
      Merge branch 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 51bbe7eb
      Linus Torvalds authored
      Pull cgroup fix from Tejun Heo:
       "Fix leak of filesystem context root which is triggered by LTP.
      
        Not too likely to be a problem in non-testing environments"
      
      * 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup1: fix leaked context root causing sporadic NULL deref in LTP
      51bbe7eb
    • Paolo Bonzini's avatar
      KVM: add missing compat KVM_CLEAR_DIRTY_LOG · 8750f9bb
      Paolo Bonzini authored
      The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
      therefore it needs a compat ioctl implementation.  Otherwise,
      32-bit userspace fails to invoke it on 64-bit kernels; for x86
      it might work fine by chance if the padding is zero, but not
      on big-endian architectures.
      
      Reported-by: Thomas Sattler
      Cc: stable@vger.kernel.org
      Fixes: 2a31b9db ("kvm: introduce manual dirty log reprotect")
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8750f9bb
    • Li RongQing's avatar
      KVM: use cpu_relax when halt polling · 74775654
      Li RongQing authored
      SMT siblings share caches and other hardware, and busy halt polling
      will degrade its sibling performance if its sibling is working
      
      Sean Christopherson suggested as below:
      
      "Rather than disallowing halt-polling entirely, on x86 it should be
      sufficient to simply have the hardware thread yield to its sibling(s)
      via PAUSE.  It probably won't get back all performance, but I would
      expect it to be close.
      This compiles on all KVM architectures, and AFAICT the intended usage
      of cpu_relax() is identical for all architectures."
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Message-Id: <20210727111247.55510-1-lirongqing@baidu.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74775654
    • Maxim Levitsky's avatar
      KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl · 5868b822
      Maxim Levitsky authored
      Currently when SVM is enabled in guest CPUID, AVIC is inhibited as soon
      as the guest CPUID is set.
      
      AVIC happens to be fully disabled on all vCPUs by the time any guest
      entry starts (if after migration the entry can be nested).
      
      The reason is that currently we disable avic right away on vCPU from which
      the kvm_request_apicv_update was called and for this case, it happens to be
      called on all vCPUs (by svm_vcpu_after_set_cpuid).
      
      After we stop doing this, AVIC will end up being disabled only when
      KVM_REQ_APICV_UPDATE is processed which is after we done switching to the
      nested guest.
      
      Fix this by just using vmcb01 in svm_refresh_apicv_exec_ctrl for avic
      (which is a right thing to do anyway).
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5868b822
    • Maxim Levitsky's avatar
      KVM: SVM: tweak warning about enabled AVIC on nested entry · feea0136
      Maxim Levitsky authored
      It is possible that AVIC was requested to be disabled but
      not yet disabled, e.g if the nested entry is done right
      after svm_vcpu_after_set_cpuid.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      feea0136
    • Maxim Levitsky's avatar
      KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated · f1577ab2
      Maxim Levitsky authored
      It is possible for AVIC inhibit and AVIC active state to be mismatched.
      Currently we disable AVIC right away on vCPU which started the AVIC inhibit
      request thus this warning doesn't trigger but at least in theory,
      if svm_set_vintr is called at the same time on multiple vCPUs,
      the warning can happen.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f1577ab2
    • Christian Borntraeger's avatar
      KVM: s390: restore old debugfs names · bb000f64
      Christian Borntraeger authored
      commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      did replace the old definitions with the binary ones. While doing that
      it missed that some files are names different than the counters. This
      is especially important for kvm_stat which does have special handling
      for counters named instruction_*.
      
      Fixes: commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      CC: Jing Zhang <jingzhangos@google.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Message-Id: <20210726150108.5603-1-borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bb000f64
    • Paolo Bonzini's avatar
      KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized · 3fa5e8fd
      Paolo Bonzini authored
      Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect
      dereference of vmcb->control.reserved_sw before the vmcb is checked
      for being non-NULL.  The compiler is usually sinking the dereference
      after the check; instead of doing this ourselves in the source,
      ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called
      with a non-NULL VMCB.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Vineeth Pillai <viremana@linux.microsoft.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [Untested for now due to issues with my AMD machine. - Paolo]
      3fa5e8fd
    • David Matlack's avatar
      KVM: selftests: Introduce access_tracking_perf_test · c33e05d9
      David Matlack authored
      This test measures the performance effects of KVM's access tracking.
      Access tracking is driven by the MMU notifiers test_young, clear_young,
      and clear_flush_young. These notifiers do not have a direct userspace
      API, however the clear_young notifier can be triggered by marking a
      pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages
      that mechanism to enable access tracking on guest memory.
      
      To measure performance this test runs a VM with a configurable number of
      vCPUs that each touch every page in disjoint regions of memory.
      Performance is measured in the time it takes all vCPUs to finish
      touching their predefined region.
      
      Example invocation:
      
        $ ./access_tracking_perf_test -v 8
        Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
        guest physical test memory offset: 0xffdfffff000
      
        Populating memory             : 1.337752570s
        Writing to populated memory   : 0.010177640s
        Reading from populated memory : 0.009548239s
        Mark memory idle              : 23.973131748s
        Writing to idle memory        : 0.063584496s
        Mark memory idle              : 24.924652964s
        Reading from idle memory      : 0.062042814s
      
      Breaking down the results:
      
       * "Populating memory": The time it takes for all vCPUs to perform the
         first write to every page in their region.
      
       * "Writing to populated memory" / "Reading from populated memory": The
         time it takes for all vCPUs to write and read to every page in their
         region after it has been populated. This serves as a control for the
         later results.
      
       * "Mark memory idle": The time it takes for every vCPU to mark every
         page in their region as idle through page_idle.
      
       * "Writing to idle memory" / "Reading from idle memory": The time it
         takes for all vCPUs to write and read to every page in their region
         after it has been marked idle.
      
      This test should be portable across architectures but it is only enabled
      for x86_64 since that's all I have tested.
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210713220957.3493520-7-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c33e05d9
    • David Matlack's avatar
      KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing · 15b7b737
      David Matlack authored
      There is a missing break statement which causes a fallthrough to the
      next statement where optarg will be null and a segmentation fault will
      be generated.
      
      Fixes: 9e965bb7 ("KVM: selftests: Add backing src parameter to dirty_log_perf_test")
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210713220957.3493520-6-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      15b7b737
    • Juergen Gross's avatar
      x86/kvm: fix vcpu-id indexed array sizes · 76b4f357
      Juergen Gross authored
      KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
      of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
      elements.
      
      Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
      an odd number, resulting in always enough padding being available at
      the end of those arrays.
      
      Nevertheless this should be fixed in order to avoid rare problems in
      case someone is using an even number for KVM_MAX_VCPU_ID.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Message-Id: <20210701154105.23215-2-jgross@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      76b4f357
    • Linus Torvalds's avatar
      Merge branch 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 82d712f6
      Linus Torvalds authored
      Pull workqueue fix from Tejun Heo:
       "Fix a use-after-free in allocation failure handling path"
      
      * 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: fix UAF in pwq_unbound_release_workfn()
      82d712f6
  5. 26 Jul, 2021 17 commits