1. 15 Jun, 2023 5 commits
    • Oliver Upton's avatar
      Merge branch for-next/module-alloc into kvmarm/next · acfdf34c
      Oliver Upton authored
      * for-next/module-alloc:
        : Drag in module VA rework to handle conflicts w/ sw feature refactor
        arm64: module: rework module VA range selection
        arm64: module: mandate MODULE_PLTS
        arm64: module: move module randomization to module.c
        arm64: kaslr: split kaslr/module initialization
        arm64: kasan: remove !KASAN_VMALLOC remnants
        arm64: module: remove old !KASAN_VMALLOC logic
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      acfdf34c
    • Oliver Upton's avatar
      Merge branch kvm-arm64/hvhe into kvmarm/next · b710fe0d
      Oliver Upton authored
      * kvm-arm64/hvhe:
        : Support for running split-hypervisor w/VHE, courtesy of Marc Zyngier
        :
        : From the cover letter:
        :
        : KVM (on ARMv8.0) and pKVM (on all revisions of the architecture) use
        : the split hypervisor model that makes the EL2 code more or less
        : standalone. In the later case, we totally ignore the VHE mode and
        : stick with the good old v8.0 EL2 setup.
        :
        : We introduce a new "mode" for KVM called hVHE, in reference to the
        : nVHE mode, and indicating that only the hypervisor is using VHE.
        KVM: arm64: Fix hVHE init on CPUs where HCR_EL2.E2H is not RES1
        arm64: Allow arm64_sw.hvhe on command line
        KVM: arm64: Force HCR_E2H in guest context when ARM64_KVM_HVHE is set
        KVM: arm64: Program the timer traps with VHE layout in hVHE mode
        KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration
        KVM: arm64: Adjust EL2 stage-1 leaf AP bits when ARM64_KVM_HVHE is set
        KVM: arm64: Disable TTBR1_EL2 when using ARM64_KVM_HVHE
        KVM: arm64: Force HCR_EL2.E2H when ARM64_KVM_HVHE is set
        KVM: arm64: Key use of VHE instructions in nVHE code off ARM64_KVM_HVHE
        KVM: arm64: Remove alternatives from sysreg accessors in VHE hypervisor context
        arm64: Use CPACR_EL1 format to set CPTR_EL2 when E2H is set
        arm64: Allow EL1 physical timer access when running VHE
        arm64: Don't enable VHE for the kernel if OVERRIDE_HVHE is set
        arm64: Add KVM_HVHE capability and has_hvhe() predicate
        arm64: Turn kaslr_feature_override into a generic SW feature override
        arm64: Prevent the use of is_kernel_in_hyp_mode() in hypervisor code
        KVM: arm64: Drop is_kernel_in_hyp_mode() from __invalidate_icache_guest_page()
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      b710fe0d
    • Oliver Upton's avatar
      Merge branch kvm-arm64/ffa-proxy into kvmarm/next · 1a08f492
      Oliver Upton authored
      * kvm-arm64/ffa-proxy:
        : pKVM FF-A Proxy, courtesy Will Deacon and Andrew Walbran
        :
        : From the cover letter:
        :
        : pKVM's primary goal is to protect guest pages from a compromised host by
        : enforcing access control restrictions using stage-2 page-tables. Sadly,
        : this cannot prevent TrustZone from accessing non-secure memory, and a
        : compromised host could, for example, perform a 'confused deputy' attack
        : by asking TrustZone to use pages that have been donated to protected
        : guests. This would effectively allow the host to have TrustZone
        : exfiltrate guest secrets on its behalf, hence breaking the isolation
        : that pKVM intends to provide.
        :
        : This series addresses this problem by providing pKVM with the ability to
        : monitor SMCs following the Arm FF-A protocol. FF-A provides (among other
        : things) a set of memory management APIs allowing the Normal World to
        : share, donate or lend pages with Secure. By monitoring these SMCs, pKVM
        : can ensure that the pages that are shared, lent or donated to Secure by
        : the host kernel are only pages that it owns.
        KVM: arm64: pkvm: Add support for fragmented FF-A descriptors
        KVM: arm64: Handle FFA_FEATURES call from the host
        KVM: arm64: Handle FFA_MEM_LEND calls from the host
        KVM: arm64: Handle FFA_MEM_RECLAIM calls from the host
        KVM: arm64: Handle FFA_MEM_SHARE calls from the host
        KVM: arm64: Add FF-A helpers to share/unshare memory with secure world
        KVM: arm64: Handle FFA_RXTX_MAP and FFA_RXTX_UNMAP calls from the host
        KVM: arm64: Allocate pages for hypervisor FF-A mailboxes
        KVM: arm64: Probe FF-A version and host/hyp partition ID during init
        KVM: arm64: Block unsafe FF-A calls from the host
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      1a08f492
    • Oliver Upton's avatar
      Merge branch kvm-arm64/eager-page-splitting into kvmarm/next · 83510396
      Oliver Upton authored
      * kvm-arm64/eager-page-splitting:
        : Eager Page Splitting, courtesy of Ricardo Koller.
        :
        : Dirty logging performance is dominated by the cost of splitting
        : hugepages to PTE granularity. On systems that mere mortals can get their
        : hands on, each fault incurs the cost of a full break-before-make
        : pattern, wherein the broadcast invalidation and ensuing serialization
        : significantly increases fault latency.
        :
        : The goal of eager page splitting is to move the cost of hugepage
        : splitting out of the stage-2 fault path and instead into the ioctls
        : responsible for managing the dirty log:
        :
        :  - If manual protection is enabled for the VM, hugepage splitting
        :    happens in the KVM_CLEAR_DIRTY_LOG ioctl. This is desirable as it
        :    provides userspace granular control over hugepage splitting.
        :
        :  - Otherwise, if userspace relies on the legacy dirty log behavior
        :    (clear on collection), hugepage splitting is done at the moment dirty
        :    logging is enabled for a particular memslot.
        :
        : Support for eager page splitting requires explicit opt-in from
        : userspace, which is realized through the
        : KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE capability.
        arm64: kvm: avoid overflow in integer division
        KVM: arm64: Use local TLBI on permission relaxation
        KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG
        KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked()
        KVM: arm64: Split huge pages when dirty logging is enabled
        KVM: arm64: Add kvm_uninit_stage2_mmu()
        KVM: arm64: Refactor kvm_arch_commit_memory_region()
        KVM: arm64: Add kvm_pgtable_stage2_split()
        KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
        KVM: arm64: Export kvm_are_all_memslots_empty()
        KVM: arm64: Add helper for creating unlinked stage2 subtrees
        KVM: arm64: Add KVM_PGTABLE_WALK flags for skipping CMOs and BBM TLBIs
        KVM: arm64: Rename free_removed to free_unlinked
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      83510396
    • Marc Zyngier's avatar
      KVM: arm64: Fix hVHE init on CPUs where HCR_EL2.E2H is not RES1 · 1700f89c
      Marc Zyngier authored
      On CPUs where E2H is RES1, we very quickly set the scene for
      running EL2 with a VHE configuration, as we do not have any other
      choice.
      
      However, CPUs that conform to the current writing of the architecture
      start with E2H=0, and only later upgrade with E2H=1. This is all
      good, but nothing there is actually reconfiguring EL2 to be able
      to correctly run the kernel at EL1. Huhuh...
      
      The "obvious" solution is not to just reinitialise the timer
      controls like we do, but to really intitialise *everything*
      unconditionally.
      
      This requires a bit of surgery, and is a good opportunity to
      remove the macro that messes with SPSR_EL2 in init_el2_state.
      
      With that, hVHE now works correctly on my trusted A55 machine!
      Reported-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230614155129.2697388-1-maz@kernel.orgSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      1700f89c
  2. 12 Jun, 2023 16 commits
  3. 06 Jun, 2023 6 commits
    • Mark Rutland's avatar
      arm64: module: rework module VA range selection · 3e35d303
      Mark Rutland authored
      Currently, the modules region is 128M in size, which is a problem for
      some large modules. Shanker reports [1] that the NVIDIA GPU driver alone
      can consume 110M of module space in some configurations. We'd like to
      make the modules region a full 2G such that we can always make use of a
      2G range.
      
      It's possible to build kernel images which are larger than 128M in some
      configurations, such as when many debug options are selected and many
      drivers are built in. In these configurations, we can't legitimately
      select a base for a 128M module region, though we currently select a
      value for which allocation will fail. It would be nicer to have a
      diagnostic message in this case.
      
      Similarly, in theory it's possible to build a kernel image which is
      larger than 2G and which cannot support modules. While this isn't likely
      to be the case for any realistic kernel deplyed in the field, it would
      be nice if we could print a diagnostic in this case.
      
      This patch reworks the module VA range selection to use a 2G range, and
      improves handling of cases where we cannot select legitimate module
      regions. We now attempt to select a 128M region and a 2G region:
      
      * The 128M region is selected such that modules can use direct branches
        (with JUMP26/CALL26 relocations) to branch to kernel code and other
        modules, and so that modules can reference data and text (using PREL32
        relocations) anywhere in the kernel image and other modules.
      
        This region covers the entire kernel image (rather than just the text)
        to ensure that all PREL32 relocations are in range even when the
        kernel data section is absurdly large. Where we cannot allocate from
        this region, we'll fall back to the full 2G region.
      
      * The 2G region is selected such that modules can use direct branches
        with PLTs to branch to kernel code and other modules, and so that
        modules can use reference data and text (with PREL32 relocations) in
        the kernel image and other modules.
      
        This region covers the entire kernel image, and the 128M region (if
        one is selected).
      
      The two module regions are randomized independently while ensuring the
      constraints described above.
      
      [1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Shanker Donthineni <sdonthineni@nvidia.com>
      Cc: Will Deacon <will@kernel.org>
      Tested-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Link: https://lore.kernel.org/r/20230530110328.2213762-7-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      3e35d303
    • Mark Rutland's avatar
      arm64: module: mandate MODULE_PLTS · ea3752ba
      Mark Rutland authored
      Contemporary kernels and modules can be relatively large, especially
      when common debug options are enabled. Using GCC 12.1.0, a v6.3-rc7
      defconfig kernel is ~38M, and with PROVE_LOCKING + KASAN_INLINE enabled
      this expands to ~117M. Shanker reports [1] that the NVIDIA GPU driver
      alone can consume 110M of module space in some configurations.
      
      Both KASLR and ARM64_ERRATUM_843419 select MODULE_PLTS, so anyone
      wanting a kernel to have KASLR or run on Cortex-A53 will have
      MODULE_PLTS selected. This is the case in defconfig and distribution
      kernels (e.g. Debian, Android, etc).
      
      Practically speaking, this means we're very likely to need MODULE_PLTS
      and while it's almost guaranteed that MODULE_PLTS will be selected, it
      is possible to disable support, and we have to maintain some awkward
      special cases for such unusual configurations.
      
      This patch removes the MODULE_PLTS config option, with the support code
      always enabled if MODULES is selected. This results in a slight
      simplification, and will allow for further improvement in subsequent
      patches.
      
      For any config which currently selects MODULE_PLTS, there will be no
      functional change as a result of this patch.
      
      [1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Shanker Donthineni <sdonthineni@nvidia.com>
      Cc: Will Deacon <will@kernel.org>
      Tested-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Link: https://lore.kernel.org/r/20230530110328.2213762-6-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      ea3752ba
    • Mark Rutland's avatar
      arm64: module: move module randomization to module.c · e46b7103
      Mark Rutland authored
      When CONFIG_RANDOMIZE_BASE=y, module_alloc_base is a variable which is
      configured by kaslr_module_init() in kaslr.c, and otherwise it is an
      expression defined in module.h.
      
      As kaslr_module_init() is no longer tightly coupled with the KASLR
      initialization code, we can centralize this in module.c.
      
      This patch moves kaslr_module_init() to module.c, making
      module_alloc_base a static variable, and removing redundant includes from
      kaslr.c. For the defintion of struct arm64_ftr_override we must include
      <asm/cpufeature.h>, which was previously included transitively via
      another header.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Tested-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Link: https://lore.kernel.org/r/20230530110328.2213762-5-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      e46b7103
    • Mark Rutland's avatar
      arm64: kaslr: split kaslr/module initialization · 6e13b6b9
      Mark Rutland authored
      Currently kaslr_init() handles a mixture of detecting/announcing whether
      KASLR is enabled, and randomizing the module region depending on whether
      KASLR is enabled.
      
      To make it easier to rework the module region initialization, split the
      KASLR initialization into two steps:
      
      * kaslr_init() determines whether KASLR should be enabled, and announces
        this choice, recording this to a new global boolean variable. This is
        called from setup_arch() just before the existing call to
        kaslr_requires_kpti() so that this will always provide the expected
        result.
      
      * kaslr_module_init() randomizes the module region when required. This
        is called as a subsys_initcall, where we previously called
        kaslr_init().
      
      As a bonus, moving the KASLR reporting earlier makes it easier to spot
      and permits it to be logged via earlycon, making it easier to debug any
      issues that could be triggered by KASLR.
      
      Booting a v6.4-rc1 kernel with this patch applied, the log looks like:
      
      | EFI stub: Booting Linux Kernel...
      | EFI stub: Generating empty DTB
      | EFI stub: Exiting boot services...
      | [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
      | [    0.000000] Linux version 6.4.0-rc1-00006-g4763a8f8aeb3 (mark@lakrids) (aarch64-linux-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) #2 SMP PREEMPT Tue May  9 11:03:37 BST 2023
      | [    0.000000] KASLR enabled
      | [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
      | [    0.000000] printk: bootconsole [pl11] enabled
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Tested-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Link: https://lore.kernel.org/r/20230530110328.2213762-4-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      6e13b6b9
    • Mark Rutland's avatar
      arm64: kasan: remove !KASAN_VMALLOC remnants · 55123aff
      Mark Rutland authored
      Historically, KASAN could be selected with or without KASAN_VMALLOC, but
      since commit:
      
        f6f37d93 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes")
      
      ... we can never select KASAN without KASAN_VMALLOC on arm64, and thus
      arm64 code for KASAN && !KASAN_VMALLOC is redundant and can be removed.
      
      Remove the redundant code kasan_init.c
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Will Deacon <will@kernel.org>
      Tested-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Link: https://lore.kernel.org/r/20230530110328.2213762-3-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      55123aff
    • Mark Rutland's avatar
      arm64: module: remove old !KASAN_VMALLOC logic · 8339f7d8
      Mark Rutland authored
      Historically, KASAN could be selected with or without KASAN_VMALLOC, and
      we had to be very careful where to place modules when KASAN_VMALLOC was
      not selected.
      
      However, since commit:
      
        f6f37d93 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes")
      
      Selecting CONFIG_KASAN on arm64 will also select CONFIG_KASAN_VMALLOC,
      and so the logic for handling CONFIG_KASAN without CONFIG_KASAN_VMALLOC
      is redundant and can be removed.
      
      Note: the "kasan.vmalloc={on,off}" option which only exists for HW_TAGS
      changes whether the vmalloc region is given non-match-all tags, and does
      not affect the page table manipulation code.
      
      The VM_DEFER_KMEMLEAK flag was only necessary for !CONFIG_KASAN_VMALLOC
      as described in its introduction in commit:
      
        60115fa5 ("mm: defer kmemleak object creation of module_alloc()")
      
      ... and therefore it can also be removed.
      
      Remove the redundant logic for !CONFIG_KASAN_VMALLOC. At the same time,
      add the missing braces around the multi-line conditional block in
      arch/arm64/kernel/module.c.
      Suggested-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Will Deacon <will@kernel.org>
      Tested-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Link: https://lore.kernel.org/r/20230530110328.2213762-2-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      8339f7d8
  4. 01 Jun, 2023 10 commits
  5. 21 May, 2023 3 commits
    • Linus Torvalds's avatar
      Linux 6.4-rc3 · 44c026a7
      Linus Torvalds authored
      44c026a7
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · fa4fe8ce
      Linus Torvalds authored
      Pull UML fix from Richard Weinberger:
      
       - Fix modular build for UML watchdog
      
      * tag 'uml-for-linus-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
        um: harddog: fix modular build
      fa4fe8ce
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a35747c3
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM:
      
         - Plug a race in the stage-2 mapping code where the IPA and the PA
           would end up being out of sync
      
         - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)
      
         - FP/SVE/SME documentation update, in the hope that this field
           becomes clearer...
      
         - Add workaround for Apple SEIS brokenness to a new SoC
      
         - Random comment fixes
      
        x86:
      
         - add MSR_IA32_TSX_CTRL into msrs_to_save
      
         - fixes for XCR0 handling in SGX enclaves
      
        Generic:
      
         - Fix vcpu_array[0] races
      
         - Fix race between starting a VM and 'reboot -f'"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save
        KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)
        KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE
        KVM: Fix vcpu_array[0] races
        KVM: VMX: Fix header file dependency of asm/vmx.h
        KVM: Don't enable hardware after a restart/shutdown is initiated
        KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown
        KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations
        KVM: arm64: Clarify host SME state management
        KVM: arm64: Restructure check for SVE support in FP trap handler
        KVM: arm64: Document check for TIF_FOREIGN_FPSTATE
        KVM: arm64: Fix repeated words in comments
        KVM: arm64: Constify start/end/phys fields of the pgtable walker data
        KVM: arm64: Infer PA offset from VA in hyp map walker
        KVM: arm64: Infer the PA offset from IPA in stage-2 map walker
        KVM: arm64: Use the bitmap API to allocate bitmaps
        KVM: arm64: Slightly optimize flush_context()
      a35747c3