1. 16 Oct, 2023 23 commits
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG · 1963d966
      Mark Rutland authored
      In __cpu_has_rng() we use cpus_have_const_cap() to check for
      ARM64_HAS_RNG, but this is not necessary and alternative_has_cap_*()
      would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      In the window between detecting the ARM64_HAS_RNG cpucap and patching
      alternative branches, nothing which calls __cpu_has_rng() can run, and
      hence it's not necessary to use cpus_have_const_cap().
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      1963d966
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN · 4e00f1d9
      Mark Rutland authored
      We use cpus_have_const_cap() to check for ARM64_HAS_EPAN but this is not
      necessary and alternative_has_cap() or cpus_have_cap() would be
      preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_HAS_EPAN cpucap is used to affect two things:
      
      1) The permision bits used for userspace executable mappings, which are
         chosen by adjust_protection_map(), which is an arch_initcall. This is
         called after the ARM64_HAS_EPAN cpucap has been detected and
         alternatives have been patched, and before any userspace translation
         tables exist.
      
      2) The handling of faults taken from (user or kernel) accesses to
         userspace executable mappings in do_page_fault(). Userspace
         translation tables are created after adjust_protection_map() is
         called, and hence after the ARM64_HAS_EPAN cpucap has been detected
         and alternatives have been patched.
      
      Neither of these run until after ARM64_HAS_EPAN cpucap has been detected
      and alternatives have been patched, and hence there's no need to use
      cpus_have_const_cap(). Since adjust_protection_map() is only executed
      once at boot time it would be best for it to use cpus_have_cap(), and
      since do_page_fault() is executed frequently it would be best for it to
      use alternatives_have_cap_unlikely().
      
      This patch replaces the uses of cpus_have_const_cap() with
      cpus_have_cap() and alternative_has_cap_unlikely(), which will avoid
      generating redundant code, and should be better for all subsequent calls
      at runtime. The ARM64_HAS_EPAN cpucap is added to cpucap_is_possible()
      so that code can be elided entirely when this is not possible.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Vladimir Murzin <vladimir.murzin@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      4e00f1d9
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN · 53d62e99
      Mark Rutland authored
      In system_uses_hw_pan() we use cpus_have_const_cap() to check for
      ARM64_HAS_PAN, but this is only necessary so that the
      system_uses_ttbr0_pan() check in setup_cpu_features() can run prior to
      alternatives being patched, and otherwise this is not necessary and
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_HAS_PAN cpucap is used by system_uses_hw_pan() and
      system_uses_ttbr0_pan() depending on whether CONFIG_ARM64_SW_TTBR0_PAN
      is selected, and:
      
      * We only use system_uses_hw_pan() directly in __sdei_handler(), which
        isn't reachable until after alternatives have been patched, and for
        this it is safe to use alternative_has_cap_*().
      
      * We use system_uses_ttbr0_pan() in a few places:
      
        - In check_and_switch_context() and cpu_uninstall_idmap(), which will
          defer installing a translation table into TTBR0 when the
          ARM64_HAS_PAN cpucap is not detected.
      
          Prior to patching alternatives, all CPUs will be using init_mm with
          the reserved ttbr0 translation tables install in TTBR0, so these can
          safely use alternative_has_cap_*().
      
        - In update_saved_ttbr0(), which will only save the active TTBR0 into
          a per-thread variable when the ARM64_HAS_PAN cpucap is not detected.
      
          Prior to patching alternatives, all CPUs will be using init_mm with
          the reserved ttbr0 translation tables install in TTBR0, so these can
          safely use alternative_has_cap_*().
      
        - In efi_set_pgd(), which will handle check_and_switch_context()
          deferring the installation of TTBR0 when TTBR0 PAN is detected.
      
          The EFI runtime services are not initialized until after
          alternatives have been patched, and so this can safely use
          alternative_has_cap_*() or cpus_have_final_cap().
      
        - In uaccess_ttbr0_disable() and uaccess_ttbr0_enable(), where we'll
          avoid installing/uninstalling a translation table in TTBR0 when
          ARM64_HAS_PAN is detected.
      
          Prior to patching alternatives we will not perform any uaccess and
          will not call uaccess_ttbr0_disable() or uaccess_ttbr0_enable(), and
          so these can safely use alternative_has_cap_*() or
          cpus_have_final_cap().
      
        - In is_el1_permission_fault() where we will consider a translation
          fault on a TTBR0 address to be a permission fault when ARM64_HAS_PAN
          is not detected *and* we have set the PAN bit in the SPSR (which
          tells us that in the interrupted context, TTBR0 pointed at the
          reserved zero ttbr).
      
          In the window between detecting system cpucaps and patching
          alternatives we should not perform any accesses to TTBR0 addresses,
          and no userspace translation tables exist until after patching
          alternatives. Thus it is safe for this to use alternative_has_cap*().
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime.
      
      So that the check for TTBR0 PAN in setup_cpu_features() can run prior to
      alternatives being patched, the call to system_uses_ttbr0_pan() is
      replaced with an explicit check of the ARM64_HAS_PAN bit in the
      system_cpucaps bitmap.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      53d62e99
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING · 20af807d
      Mark Rutland authored
      In system_uses_irq_prio_masking() we use cpus_have_const_cap() to check
      for ARM64_HAS_GIC_PRIO_MASKING, but this is not necessary and
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      When CONFIG_ARM64_PSEUDO_NMI=y the ARM64_HAS_GIC_PRIO_MASKING cpucap is
      a strict boot cpu feature which is detected and patched early on the
      boot cpu, which both happen in smp_prepare_boot_cpu(). In the window
      between the ARM64_HAS_GIC_PRIO_MASKING cpucap is detected and
      alternatives are patched we don't run any code that depends upon the
      ARM64_HAS_GIC_PRIO_MASKING cpucap:
      
      * We leave DAIF.IF set until after boot alternatives are patched, and
        interrupts are unmasked later in init_IRQ(), so we cannot reach
        IRQ/FIQ entry code and will not use irqs_priority_unmasked().
      
      * We don't call any code which uses arm_cpuidle_save_irq_context() and
        arm_cpuidle_restore_irq_context() during this window.
      
      * We don't call start_thread_common() during this window.
      
      * The local_irq_*() code in <asm/irqflags.h> depends solely on an
        alternative branch since commit:
      
        a5f61cc6 ("arm64: irqflags: use alternative branches for pseudo-NMI logic")
      
        ... and hence will use the default (DAIF-only) masking behaviour until
        alternatives are patched.
      
      * Secondary CPUs are brought up later after alternatives are patched,
        and alternatives are patched on the boot CPU immediately prior to
        calling init_gic_priority_masking(), so we'll correctly initialize
        interrupt masking regardless.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which avoid generating code to test the
      system_cpucaps bitmap and should be better for all subsequent calls at
      runtime. As this makes system_uses_irq_prio_masking() equivalent to
      __irqflags_uses_pmr(), the latter is removed and replaced with the
      former for consistency.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      20af807d
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT · 25693f17
      Mark Rutland authored
      In __cpu_suspend_exit() we use cpus_have_const_cap() to check for
      ARM64_HAS_DIT but this is not necessary and cpus_have_final_cap() of
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_HAS_DIT cpucap is detected and patched (along with all other
      cpucaps) before __cpu_suspend_exit() can run. We'll only use
      __cpu_suspend_exit() as part of PSCI cpuidle or hibernation, and both of
      these are intialized after system cpucaps are detected and patched: the
      PSCI cpuidle driver is registered with a device_initcall, hibernation
      restoration occurs in a late_initcall, and hibarnation saving is driven
      by usrspace. Therefore it is not necessary to use cpus_have_const_cap(),
      and using alternative_has_cap_*() or cpus_have_final_cap() is
      sufficient.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime. To clearly document the ordering relationship between
      suspend/resume and alternatives patching, an explicit check for
      system_capabilities_finalized() is added to cpu_suspend() along with a
      comment block, which will make it easier to spot issues if code is
      changed in future to allow these functions to be reached earlier.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      25693f17
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CNP · 54c8818a
      Mark Rutland authored
      In system_supports_cnp() we use cpus_have_const_cap() to check for
      ARM64_HAS_CNP, but this is only necessary so that the cpu_enable_cnp()
      callback can run prior to alternatives being patched, and otherwise this
      is not necessary and alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The cpu_enable_cnp() callback is run immediately after the ARM64_HAS_CNP
      cpucap is detected system-wide under setup_system_capabilities(), prior
      to alternatives being patched. During this window cpu_enable_cnp() uses
      cpu_replace_ttbr1() to set the CNP bit for the swapper_pg_dir in TTBR1.
      No other users of the ARM64_HAS_CNP cpucap need the up-to-date value
      during this window:
      
      * As KVM isn't initialized yet, kvm_get_vttbr() isn't reachable.
      
      * As cpuidle isn't initialized yet, __cpu_suspend_exit() isn't
        reachable.
      
      * At this point all CPUs are using the swapper_pg_dir with a reserved
        ASID in TTBR1, and the idmap_pg_dir in TTBR0, so neither
        check_and_switch_context() nor cpu_do_switch_mm() need to do anything
        special.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime. To allow cpu_enable_cnp() to function prior to alternatives
      being patched, cpu_replace_ttbr1() is split into cpu_replace_ttbr1() and
      cpu_enable_swapper_cnp(), with the former only used for early TTBR1
      replacement, and the latter used by both cpu_enable_cnp() and
      __cpu_suspend_exit().
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Vladimir Murzin <vladimir.murzin@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      54c8818a
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CACHE_DIC · 6766a8ef
      Mark Rutland authored
      In icache_inval_all_pou() we use cpus_have_const_cap() to check for
      ARM64_HAS_CACHE_DIC, but this is not necessary and
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The cpus_have_const_cap() check in icache_inval_all_pou() is an
      optimization to skip a redundant (but benign) IC IALLUIS + DSB ISH
      sequence when all CPUs in the system have DIC. In the window between
      detecting the ARM64_HAS_CACHE_DIC cpucap and patching alternative
      branches there is only a single potential call to icache_inval_all_pou()
      (in the alternatives patching itself), which there's no need to optimize
      for at the expense of other callers.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime. This also aligns better with the way we patch the assembly
      cache maintenance sequences in arch/arm64/mm/cache.S.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      6766a8ef
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTI · bbbb6577
      Mark Rutland authored
      In system_supports_bti() we use cpus_have_const_cap() to check for
      ARM64_HAS_BTI, but this is not necessary and alternative_has_cap_*() or
      cpus_have_final_*cap() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      When CONFIG_ARM64_BTI_KERNEL=y, the ARM64_HAS_BTI cpucap is a strict
      boot cpu feature which is detected and patched early on the boot cpu.
      All uses guarded by CONFIG_ARM64_BTI_KERNEL happen after the boot CPU
      has detected ARM64_HAS_BTI and patched boot alternatives, and hence can
      safely use alternative_has_cap_*() or cpus_have_final_boot_cap().
      
      Regardless of CONFIG_ARM64_BTI_KERNEL, all other uses of ARM64_HAS_BTI
      happen after system capabilities have been finalized and alternatives
      have been patched. Hence these can safely use alternative_has_cap_*) or
      cpus_have_final_cap().
      
      This patch splits system_supports_bti() into system_supports_bti() and
      system_supports_bti_kernel(), with the former handling where the cpucap
      affects userspace functionality, and ther latter handling where the
      cpucap affects kernel functionality. The use of cpus_have_const_cap() is
      replaced by cpus_have_final_cap() in cpus_have_const_cap, and
      cpus_have_final_boot_cap() in system_supports_bti_kernel(). This will
      avoid generating code to test the system_cpucaps bitmap and should be
      better for all subsequent calls at runtime. The use of
      cpus_have_final_cap() and cpus_have_final_boot_cap() will make it easier
      to spot if code is chaanged such that these run before the ARM64_HAS_BTI
      cpucap is guaranteed to have been finalized.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      bbbb6577
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_ARMv8_4_TTL · d70bac1d
      Mark Rutland authored
      In __tlbi_level() we use cpus_have_const_cap() to check for
      ARM64_HAS_ARMv8_4_TTL, but this is not necessary and
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      In the window between detecting the ARM64_HAS_ARMv8_4_TTL cpucap and
      patching alternative branches, we do not perform any TLB invalidation,
      and even if we were to perform TLB invalidation here it would not be
      functionally necessary to optimize this by using the TTL hint. Hence
      there's no need to use cpus_have_const_cap(), and
      alternative_has_cap_unlikely() is sufficient.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      d70bac1d
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_HAS_{ADDRESS,GENERIC}_AUTH · 7f0387cf
      Mark Rutland authored
      In system_supports_address_auth() and system_supports_generic_auth() we
      use cpus_have_const_cap to check for ARM64_HAS_ADDRESS_AUTH and
      ARM64_HAS_GENERIC_AUTH respectively, but this is not necessary and
      alternative_has_cap_*() would bre preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_HAS_ADDRESS_AUTH cpucap is a boot cpu feature which is
      detected and patched early on the boot CPU before any pointer
      authentication keys are enabled via their respective SCTLR_ELx.EN* bits.
      Nothing which uses system_supports_address_auth() is called before the
      boot alternatives are patched. Thus it is safe for
      system_supports_address_auth() to use cpus_have_final_boot_cap() to
      check for ARM64_HAS_ADDRESS_AUTH.
      
      The ARM64_HAS_GENERIC_AUTH cpucap is a system feature which is detected
      on all CPUs, then finalized and patched under
      setup_system_capabilities(). We use system_supports_generic_auth() in a
      few places:
      
      * The pac_generic_keys_get() and pac_generic_keys_set() functions are
        only reachable from system calls once userspace is up and running. As
        cpucaps are finalzied long before userspace runs, these can safely use
        alternative_has_cap_*() or cpus_have_final_cap().
      
      * The ptrauth_prctl_reset_keys() function is only reachable from system
        calls once userspace is up and running. As cpucaps are finalized long
        before userspace runs, this can safely use alternative_has_cap_*() or
        cpus_have_final_cap().
      
      * The ptrauth_keys_install_user() function is used during
        context-switch. This is called prior to alternatives being applied,
        and so cannot use cpus_have_final_cap(), but as this only needs to
        switch the APGA key for userspace tasks, it's safe to use
        alternative_has_cap_*().
      
      * The ptrauth_keys_init_user() function is used to initialize userspace
        keys, and is only reachable after system cpucaps have been finalized
        and patched. Thus this can safely use alternative_has_cap_*() or
        cpus_have_final_cap().
      
      * The system_has_full_ptr_auth() helper function is only used by KVM
        code, which is only reachable after system cpucaps have been finalized
        and patched. Thus this can safely use alternative_has_cap_*() or
        cpus_have_final_cap().
      
      This patch modifies system_supports_address_auth() to use
      cpus_have_final_boot_cap() to check ARM64_HAS_ADDRESS_AUTH, and modifies
      system_supports_generic_auth() to use alternative_has_cap_unlikely() to
      check ARM64_HAS_GENERIC_AUTH. In either case this will avoid generating
      code to test the system_cpucaps bitmap and should be better for all
      subsequent calls at runtime. The use of cpus_have_final_boot_cap() will
      make it easier to spot if code is chaanged such that these run before
      the relevant cpucap is guaranteed to have been finalized.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      7f0387cf
    • Mark Rutland's avatar
      arm64: Use a positive cpucap for FP/SIMD · 34f66c4c
      Mark Rutland authored
      Currently we have a negative cpucap which describes the *absence* of
      FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is
      somewhat awkward relative to other cpucaps that describe the presence of
      a feature, and it would be nicer to have a cpucap which describes the
      presence of FP/SIMD:
      
      * This will allow the cpucap to be treated as a standard
        ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard
        has_cpuid_feature() function and ARM64_CPUID_FIELDS() description.
      
      * This ensures that the cpucap will only transition from not-present to
        present, reducing the risk of unintentional and/or unsafe usage of
        FP/SIMD before cpucaps are finalized.
      
      * This will allow using arm64_cpu_capabilities::cpu_enable() to enable
        the use of FP/SIMD later, with FP/SIMD being disabled at boot time
        otherwise. This will ensure that any unintentional and/or unsafe usage
        of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is
        never unintentionally enabled for userspace in mismatched big.LITTLE
        systems.
      
      This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a
      positive ARM64_HAS_FPSIMD cpucap, making changes as described above.
      Note that as FP/SIMD will now be trapped when not supported system-wide,
      do_fpsimd_acc() must handle these traps in the same way as for SVE and
      SME. The commentary in fpsimd_restore_current_state() is updated to
      describe the new scheme.
      
      No users of system_supports_fpsimd() need to know that FP/SIMD is
      available prior to alternatives being patched, so this is updated to
      use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD
      cpucap, without generating code to test the system_cpucaps bitmap.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      34f66c4c
    • Mark Rutland's avatar
      arm64: Rename SVE/SME cpu_enable functions · 14567ba4
      Mark Rutland authored
      The arm64_cpu_capabilities::cpu_enable() callbacks for SVE, SME, SME2,
      and FA64 are named with an unusual "${feature}_kernel_enable" pattern
      rather than the much more common "cpu_enable_${feature}". Now that we
      only use these as cpu_enable() callbacks, it would be nice to have them
      match the usual scheme.
      
      This patch renames the cpu_enable() callbacks to match this scheme. At
      the same time, the comment above cpu_enable_sve() is removed for
      consistency with the other cpu_enable() callbacks.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      14567ba4
    • Mark Rutland's avatar
      arm64: Use build-time assertions for cpucap ordering · 90772291
      Mark Rutland authored
      Both sme2_kernel_enable() and fa64_kernel_enable() need to run after
      sme_kernel_enable(). This happens to be true today as ARM64_SME has a
      lower index than either ARM64_SME2 or ARM64_SME_FA64, and both functions
      have a comment to this effect.
      
      It would be nicer to have a build-time assertion like we for for
      can_use_gic_priorities() and has_gic_prio_relaxed_sync(), as that way
      it will be harder to miss any potential breakage.
      
      This patch replaces the comments with build-time assertions.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      90772291
    • Mark Rutland's avatar
      arm64: Explicitly save/restore CPACR when probing SVE and SME · bc9bbb78
      Mark Rutland authored
      When a CPUs onlined we first probe for supported features and
      propetites, and then we subsequently enable features that have been
      detected. This is a little problematic for SVE and SME, as some
      properties (e.g. vector lengths) cannot be probed while they are
      disabled. Due to this, the code probing for SVE properties has to enable
      SVE for EL1 prior to proving, and the code probing for SME properties
      has to enable SME for EL1 prior to probing. We never disable SVE or SME
      for EL1 after probing.
      
      It would be a little nicer to transiently enable SVE and SME during
      probing, leaving them both disabled unless explicitly enabled, as this
      would make it much easier to catch unintentional usage (e.g. when they
      are not present system-wide).
      
      This patch reworks the SVE and SME feature probing code to only
      transiently enable support at EL1, disabling after probing is complete.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      bc9bbb78
    • Mark Rutland's avatar
      arm64: kvm: Use cpus_have_final_cap() explicitly · d8569fba
      Mark Rutland authored
      Much of the arm64 KVM code uses cpus_have_const_cap() to check for
      cpucaps, but this is unnecessary and it would be preferable to use
      cpus_have_final_cap().
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      KVM is initialized after cpucaps have been finalized and alternatives
      have been patched. Since commit:
      
        d86de40d ("arm64: cpufeature: upgrade hyp caps to final")
      
      ... use of cpus_have_const_cap() in hyp code is automatically converted
      to use cpus_have_final_cap():
      
      | static __always_inline bool cpus_have_const_cap(int num)
      | {
      | 	if (is_hyp_code())
      | 		return cpus_have_final_cap(num);
      | 	else if (system_capabilities_finalized())
      | 		return __cpus_have_const_cap(num);
      | 	else
      | 		return cpus_have_cap(num);
      | }
      
      Thus, converting hyp code to use cpus_have_final_cap() directly will not
      result in any functional change.
      
      Non-hyp KVM code is also not executed until cpucaps have been finalized,
      and it would be preferable to extent the same treatment to this code and
      use cpus_have_final_cap() directly.
      
      This patch converts instances of cpus_have_const_cap() in KVM-only code
      over to cpus_have_final_cap(). As all of this code runs after cpucaps
      have been finalized, there should be no functional change as a result of
      this patch, but the redundant instructions generated by
      cpus_have_const_cap() will be removed from the non-hyp KVM code.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      d8569fba
    • Mark Rutland's avatar
      arm64: Split kpti_install_ng_mappings() · 42c5a3b0
      Mark Rutland authored
      The arm64_cpu_capabilities::cpu_enable callbacks are intended for
      cpu-local feature enablement (e.g. poking system registers). These get
      called for each online CPU when boot/system cpucaps get finalized and
      enabled, and get called whenever a CPU is subsequently onlined.
      
      For KPTI with the ARM64_UNMAP_KERNEL_AT_EL0 cpucap, we use the
      kpti_install_ng_mappings() function as the cpu_enable callback. This
      does a mixture of cpu-local configuration (setting VBAR_EL1 to the
      appropriate trampoline vectors) and some global configuration (rewriting
      the swapper page tables to sue non-glboal mappings) that must happen at
      most once.
      
      This patch splits kpti_install_ng_mappings() into a cpu-local
      cpu_enable_kpti() initialization function and a system-wide
      kpti_install_ng_mappings() function. The cpu_enable_kpti() function is
      responsible for selecting the necessary cpu-local vectors each time a
      CPU is onlined, and the kpti_install_ng_mappings() function performs the
      one-time rewrite of the translation tables too use non-global mappings.
      Splitting the two makes the code a bit easier to follow and also allows
      the page table rewriting code to be marked as __init such that it can be
      freed after use.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      42c5a3b0
    • Mark Rutland's avatar
      arm64: Fixup user features at boot time · 7f632d33
      Mark Rutland authored
      For ARM64_WORKAROUND_2658417, we use a cpu_enable() callback to hide the
      ID_AA64ISAR1_EL1.BF16 ID register field. This is a little awkward as
      CPUs may attempt to apply the workaround concurrently, requiring that we
      protect the bulk of the callback with a raw_spinlock, and requiring some
      pointless work every time a CPU is subsequently hotplugged in.
      
      This patch makes this a little simpler by handling the masking once at
      boot time. A new user_feature_fixup() function is called at the start of
      setup_user_features() to mask the feature, matching the style of
      elf_hwcap_fixup(). The ARM64_WORKAROUND_2658417 cpucap is added to
      cpucap_is_possible() so that code can be elided entirely when this is
      not possible.
      
      Note that the ARM64_WORKAROUND_2658417 capability is matched with
      ERRATA_MIDR_RANGE(), which implicitly gives the capability a
      ARM64_CPUCAP_LOCAL_CPU_ERRATUM type, which forbids the late onlining of
      a CPU with the erratum if the erratum was not present at boot time.
      Therefore this patch doesn't change the behaviour for late onlining.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      7f632d33
    • Mark Rutland's avatar
      arm64: Rework setup_cpu_features() · 075f48c9
      Mark Rutland authored
      Currently setup_cpu_features() handles a mixture of one-time kernel
      feature setup (e.g. cpucaps) and one-time user feature setup (e.g. ELF
      hwcaps). Subsequent patches will rework other one-time setup and expand
      the logic currently in setup_cpu_features(), and in preparation for this
      it would be helpful to split the kernel and user setup into separate
      functions.
      
      This patch splits setup_user_features() out of setup_cpu_features(),
      with a few additional cleanups of note:
      
      * setup_cpu_features() is renamed to setup_system_features() to make it
        clear that it handles system-wide feature setup rather than cpu-local
        feature setup.
      
      * setup_system_capabilities() is folded into setup_system_features().
      
      * Presence of TTBR0 pan is logged immediately after
        update_cpu_capabilities(), so that this is guaranteed to appear
        alongside all the other detected system cpucaps.
      
      * The 'cwg' variable is removed as its value is only consumed once and
        it's simpler to use cache_type_cwg() directly without assigning its
        return value to a variable.
      
      * The call to setup_user_features() is moved after alternatives are
        patched, which will allow user feature setup code to depend on
        alternative branches and allow for simplifications in subsequent
        patches.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      075f48c9
    • Mark Rutland's avatar
      arm64: Add cpus_have_final_boot_cap() · 7bf46aa1
      Mark Rutland authored
      The cpus_have_final_cap() function can be used to test a cpucap while
      also verifying that we do not consume the cpucap until system
      capabilities have been finalized. It would be helpful if we could do
      likewise for boot cpucaps.
      
      This patch adds a new cpus_have_final_boot_cap() helper which can be
      used to test a cpucap while also verifying that boot capabilities have
      been finalized. Users will be added in subsequent patches.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      7bf46aa1
    • Mark Rutland's avatar
      arm64: Add cpucap_is_possible() · de66cb37
      Mark Rutland authored
      Many cpucaps can only be set when certain CONFIG_* options are selected,
      and we need to check the CONFIG_* option before the cap in order to
      avoid generating redundant code. Due to this, we have a growing number
      of helpers in <asm/cpufeature.h> of the form:
      
      | static __always_inline bool system_supports_foo(void)
      | {
      |         return IS_ENABLED(CONFIG_ARM64_FOO) &&
      |                 cpus_have_const_cap(ARM64_HAS_FOO);
      | }
      
      This is unfortunate as it forces us to use cpus_have_const_cap()
      unnecessarily, resulting in redundant code being generated by the
      compiler. In the vast majority of cases, we only require that feature
      checks indicate the presence of a feature after cpucaps have been
      finalized, and so it would be sufficient to use alternative_has_cap_*().
      However some code needs to handle a feature before alternatives have
      been patched, and must test the system_cpucaps bitmap via
      cpus_have_const_cap(). In other cases we'd like to check for
      unintentional usage of a cpucap before alternatives are patched, and so
      it would be preferable to use cpus_have_final_cap().
      
      Placing the IS_ENABLED() checks in each callsite is tedious and
      error-prone, and the same applies for writing wrappers for each
      comination of cpucap and alternative_has_cap_*() / cpus_have_cap() /
      cpus_have_final_cap(). It would be nicer if we could centralize the
      knowledge of which cpucaps are possible, and have
      alternative_has_cap_*(), cpus_have_cap(), and cpus_have_final_cap()
      handle this automatically.
      
      This patch adds a new cpucap_is_possible() function which will be
      responsible for checking the CONFIG_* option, and updates the low-level
      cpucap checks to use this. The existing CONFIG_* checks in
      <asm/cpufeature.h> are moved over to cpucap_is_possible(), but the (now
      trival) wrapper functions are retained for now.
      
      There should be no functional change as a result of this patch alone.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      de66cb37
    • Mark Rutland's avatar
      arm64: Factor out cpucap definitions · 484de085
      Mark Rutland authored
      For clarity it would be nice to factor cpucap manipulation out of
      <asm/cpufeature.h>, and the obvious place would be <asm/cpucap.h>, but
      this will clash somewhat with <generated/asm/cpucaps.h>.
      
      Rename <generated/asm/cpucaps.h> to <generated/asm/cpucap-defs.h>,
      matching what we do for <generated/asm/sysreg-defs.h>, and introduce a
      new <asm/cpucaps.h> which includes the generated header.
      
      Subsequent patches will fill out <asm/cpucaps.h>.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      484de085
    • Mark Rutland's avatar
      arm64/arm: xen: enlighten: Fix KPTI checks · 20f3b8ea
      Mark Rutland authored
      When KPTI is in use, we cannot register a runstate region as XEN
      requires that this is always a valid VA, which we cannot guarantee. Due
      to this, xen_starting_cpu() must avoid registering each CPU's runstate
      region, and xen_guest_init() must avoid setting up features that depend
      upon it.
      
      We tried to ensure that in commit:
      
        f88af722 (" xen/arm: do not setup the runstate info page if kpti is enabled")
      
      ... where we added checks for xen_kernel_unmapped_at_usr(), which wraps
      arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit
      arm.
      
      Unfortunately, as xen_guest_init() is an early_initcall, this happens
      before secondary CPUs are booted and arm64 has finalized the
      ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs
      arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as
      secondary CPUs are onlined. On a big.LITTLE system where the boot CPU
      does not require KPTI but some secondary CPUs do, this will result in
      xen_guest_init() intializing features that depend on the runstate
      region, and xen_starting_cpu() registering the runstate region on some
      CPUs before KPTI is subsequent enabled, resulting the the problems the
      aforementioned commit tried to avoid.
      
      Handle this more robsutly by deferring the initialization of the
      runstate region until secondary CPUs have been initialized and the
      ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is
      moved into a new hotplug starting function which is registered later
      when we're certain that KPTI will not be used.
      
      Fixes: f88af722 ("xen/arm: do not setup the runstate info page if kpti is enabled")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Bertrand Marquis <bertrand.marquis@arm.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      20f3b8ea
    • Mark Rutland's avatar
      clocksource/drivers/arm_arch_timer: Initialize evtstrm after finalizing cpucaps · 166b76a0
      Mark Rutland authored
      We attempt to initialize each CPU's arch_timer event stream in
      arch_timer_evtstrm_enable(), which we call from the
      arch_timer_starting_cpu() cpu hotplug callback which is registered early
      in boot. As this is registered before we initialize the system cpucaps,
      the test for ARM64_HAS_ECV will always be false for CPUs present at boot
      time, and will only be taken into account for CPUs onlined late
      (including those which are hotplugged out and in again).
      
      Due to this, CPUs present and boot time may not use the intended divider
      and scale factor to generate the event stream, and may differ from other
      CPUs.
      
      Correct this by only initializing the event stream after cpucaps have been
      finalized, registering a separate CPU hotplug callback for the event stream
      configuration. Since the caps must be finalized by this point, use
      cpus_have_final_cap() to verify this.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      166b76a0
  2. 24 Sep, 2023 4 commits
    • Linus Torvalds's avatar
      Linux 6.6-rc3 · 6465e260
      Linus Torvalds authored
      6465e260
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 8a511e7e
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
      "ARM:
      
         - Fix EL2 Stage-1 MMIO mappings where a random address was used
      
         - Fix SMCCC function number comparison when the SVE hint is set
      
        RISC-V:
      
         - Fix KVM_GET_REG_LIST API for ISA_EXT registers
      
         - Fix reading ISA_EXT register of a missing extension
      
         - Fix ISA_EXT register handling in get-reg-list test
      
         - Fix filtering of AIA registers in get-reg-list test
      
        x86:
      
         - Fixes for TSC_AUX virtualization
      
         - Stop zapping page tables asynchronously, since we don't zap them as
           often as before"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX
        KVM: SVM: Fix TSC_AUX virtualization setup
        KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
        KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously
        KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe()
        KVM: x86/mmu: Open code leaf invalidation from mmu_notifier
        KVM: riscv: selftests: Selectively filter-out AIA registers
        KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list
        RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensions
        RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers
        KVM: selftests: Assert that vasprintf() is successful
        KVM: arm64: nvhe: Ignore SVE hint in SMCCC function ID
        KVM: arm64: Properly return allocated EL2 VA from hyp_alloc_private_va_range()
      8a511e7e
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 5edc6bb3
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix the "bytes" output of the per_cpu stat file
      
         The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
         accounting was not accurate. It is suppose to show how many used
         bytes are still in the ring buffer, but even when the ring buffer was
         empty it would still show there were bytes used.
      
       - Fix a bug in eventfs where reading a dynamic event directory (open)
         and then creating a dynamic event that goes into that diretory screws
         up the accounting.
      
         On close, the newly created event dentry will get a "dput" without
         ever having a "dget" done for it. The fix is to allocate an array on
         dir open to save what dentries were actually "dget" on, and what ones
         to "dput" on close.
      
      * tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        eventfs: Remember what dentries were created on dir open
        ring-buffer: Fix bytes info in per_cpu buffer stats
      5edc6bb3
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 2ad78f8c
      Linus Torvalds authored
      Pull cxl fixes from Dan Williams:
       "A collection of regression fixes, bug fixes, and some small cleanups
        to the Compute Express Link code.
      
        The regressions arrived in the v6.5 dev cycle and missed the v6.6
        merge window due to my personal absences this cycle. The most
        important fixes are for scenarios where the CXL subsystem fails to
        parse valid region configurations established by platform firmware.
        This is important because agreement between OS and BIOS on the CXL
        configuration is fundamental to implementing "OS native" error
        handling, i.e. address translation and component failure
        identification.
      
        Other important fixes are a driver load error when the BIOS lets the
        Linux PCI core handle AER events, but not CXL memory errors.
      
        The other fixex might have end user impact, but for now are only known
        to trigger in our test/emulation environment.
      
        Summary:
      
         - Fix multiple scenarios where platform firmware defined regions fail
           to be assembled by the CXL core.
      
         - Fix a spurious driver-load failure on platforms that enable OS
           native AER, but not OS native CXL error handling.
      
         - Fix a regression detecting "poison" commands when "security"
           commands are also defined.
      
         - Fix a cxl_test regression with the move to centralize CXL port
           register enumeration in the CXL core.
      
         - Miscellaneous small fixes and cleanups"
      
      * tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/acpi: Annotate struct cxl_cxims_data with __counted_by
        cxl/port: Fix cxl_test register enumeration regression
        cxl/region: Refactor granularity select in cxl_port_setup_targets()
        cxl/region: Match auto-discovered region decoders by HPA range
        cxl/mbox: Fix CEL logic for poison and security commands
        cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native()
        PCI/AER: Export pcie_aer_is_native()
        cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
      2ad78f8c
  3. 23 Sep, 2023 13 commits
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 3aba70ae
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix an invalid usage of __free(kfree) leading to kfreeing an
         ERR_PTR()
      
       - fix an irq domain leak in gpio-tb10x
      
       - MAINTAINERS update
      
      * tag 'gpio-fixes-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: sim: fix an invalid __free() usage
        gpio: tb10x: Fix an error handling path in tb10x_gpio_probe()
        MAINTAINERS: gpio-regmap: make myself a maintainer of it
      3aba70ae
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of... · 85eba5f1
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three
        are cc:stable"
      
      * tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        proc: nommu: fix empty /proc/<pid>/maps
        filemap: add filemap_map_order0_folio() to handle order0 folio
        proc: nommu: /proc/<pid>/maps: release mmap read lock
        mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
        pidfd: prevent a kernel-doc warning
        argv_split: fix kernel-doc warnings
        scatterlist: add missing function params to kernel-doc
        selftests/proc: fixup proc-empty-vm test after KSM changes
        revert "scripts/gdb/symbols: add specific ko module load command"
        selftests: link libasan statically for tests with -fsanitize=address
        task_work: add kerneldoc annotation for 'data' argument
        mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
        sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
      85eba5f1
    • Linus Torvalds's avatar
      Merge tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 8565bdf8
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
       "Six smb3 client fixes, including three for stable, from the SMB
        plugfest (testing event) this week:
      
         - Reparse point handling fix (found when investigating dir
           enumeration when fifo in dir)
      
         - Fix excessive thread creation for dir lease cleanup
      
         - UAF fix in negotiate path
      
         - remove duplicate error message mapping and fix confusing warning
           message
      
         - add dynamic trace point to improve debugging RDMA connection
           attempts"
      
      * tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: fix confusing debug message
        smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED
        smb3: remove duplicate error mapping
        cifs: Fix UAF in cifs_demultiplex_thread()
        smb3: do not start laundromat thread when dir leases  disabled
        smb3: Add dynamic trace points for RDMA (smbdirect) reconnect
      8565bdf8
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 5a4de7dc
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "A set of I2C driver fixes. Mostly fixing resource leaks or sanity
        checks"
      
      * tag 'i2c-for-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: xiic: Correct return value check for xiic_reinit()
        i2c: mux: gpio: Add missing fwnode_handle_put()
        i2c: mux: demux-pinctrl: check the return value of devm_kstrdup()
        i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
        i2c: i801: unregister tco_pdev in i801_probe() error path
      5a4de7dc
    • Charles Keepax's avatar
      mfd: cs42l43: Use correct macro for new-style PM runtime ops · eb72d520
      Charles Keepax authored
      The code was accidentally mixing new and old style macros, update the
      macros used to remove an unused function warning whilst building with
      no PM enabled in the config.
      
      Fixes: ace6d144 ("mfd: cs42l43: Add support for cs42l43 core driver")
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.cirrus.com>
      Link: https://lore.kernel.org/all/20230822114914.340359-1-ckeepax@opensource.cirrus.com/Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarLee Jones <lee@kernel.org>
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb72d520
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.6-1' of... · 93397d3a
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix lockdep, fix a boot failure, fix some build warnings, fix document
        links, and some cleanups"
      
      * tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        docs/zh_CN/LoongArch: Update the links of ABI
        docs/LoongArch: Update the links of ABI
        LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem()
        kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage
        LoongArch: Set all reserved memblocks on Node#0 at initialization
        LoongArch: Remove dead code in relocate_new_kernel
        LoongArch: Use _UL() and _ULL()
        LoongArch: Fix some build warnings with W=1
        LoongArch: Fix lockdep static memory detection
      93397d3a
    • Linus Torvalds's avatar
      Merge tag 's390-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 2e3d3911
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix potential string buffer overflow in hypervisor user-defined
         certificates handling
      
       - Update defconfigs
      
      * tag 's390-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/cert_store: fix string length handling
        s390: update defconfigs
      2e3d3911
    • Linus Torvalds's avatar
      Merge tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 59c376d6
      Linus Torvalds authored
      Pull iomap fixes from Darrick Wong:
      
       - Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
         less poorly with block device io racing with block device resizing
      
       - Fix a stale page data exposure bug introduced in 6.6-rc1 when
         unsharing a file range that is not in the page cache
      
      * tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: convert iomap_unshare_iter to use large folios
        iomap: don't skip reading in !uptodate folios when unsharing a range
        iomap: handle error conditions more gracefully in iomap_to_bh
      59c376d6
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-fixes-6.6-1' of https://github.com/kvm-riscv/linux into HEAD · 5804c19b
      Paolo Bonzini authored
      KVM/riscv fixes for 6.6, take #1
      
      - Fix KVM_GET_REG_LIST API for ISA_EXT registers
      - Fix reading ISA_EXT register of a missing extension
      - Fix ISA_EXT register handling in get-reg-list test
      - Fix filtering of AIA registers in get-reg-list test
      5804c19b
    • Tom Lendacky's avatar
      KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX · 916e3e5f
      Tom Lendacky authored
      When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B"
      within the VMSA. This means that the guest value is loaded on VMRUN and
      the host value is restored from the host save area on #VMEXIT.
      
      Since the value is restored on #VMEXIT, the KVM user return MSR support
      for TSC_AUX can be replaced by populating the host save area with the
      current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux
      post-boot, the host save area can be set once in svm_hardware_enable().
      This eliminates the two WRMSR instructions associated with the user return
      MSR support.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <d381de38eb0ab6c9c93dda8503b72b72546053d7.1694811272.git.thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      916e3e5f
    • Tom Lendacky's avatar
      KVM: SVM: Fix TSC_AUX virtualization setup · e0096d01
      Tom Lendacky authored
      The checks for virtualizing TSC_AUX occur during the vCPU reset processing
      path. However, at the time of initial vCPU reset processing, when the vCPU
      is first created, not all of the guest CPUID information has been set. In
      this case the RDTSCP and RDPID feature support for the guest is not in
      place and so TSC_AUX virtualization is not established.
      
      This continues for each vCPU created for the guest. On the first boot of
      an AP, vCPU reset processing is executed as a result of an APIC INIT
      event, this time with all of the guest CPUID information set, resulting
      in TSC_AUX virtualization being enabled, but only for the APs. The BSP
      always sees a TSC_AUX value of 0 which probably went unnoticed because,
      at least for Linux, the BSP TSC_AUX value is 0.
      
      Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
      into the vcpu_after_set_cpuid() path to allow for proper initialization of
      the support after the guest CPUID information has been set.
      
      With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
      path, the intercepts must be either cleared or set based on the guest
      CPUID input.
      
      Fixes: 296d5a17 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e0096d01
    • Paolo Bonzini's avatar
      KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway · e8d93d5d
      Paolo Bonzini authored
      svm_recalc_instruction_intercepts() is always called at least once
      before the vCPU is started, so the setting or clearing of the RDTSCP
      intercept can be dropped from the TSC_AUX virtualization support.
      
      Extracted from a patch by Tom Lendacky.
      
      Cc: stable@vger.kernel.org
      Fixes: 296d5a17 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8d93d5d
    • Sean Christopherson's avatar
      KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously · 0df9dab8
      Sean Christopherson authored
      Stop zapping invalidate TDP MMU roots via work queue now that KVM
      preserves TDP MMU roots until they are explicitly invalidated.  Zapping
      roots asynchronously was effectively a workaround to avoid stalling a vCPU
      for an extended during if a vCPU unloaded a root, which at the time
      happened whenever the guest toggled CR0.WP (a frequent operation for some
      guest kernels).
      
      While a clever hack, zapping roots via an unbound worker had subtle,
      unintended consequences on host scheduling, especially when zapping
      multiple roots, e.g. as part of a memslot.  Because the work of zapping a
      root is no longer bound to the task that initiated the zap, things like
      the CPU affinity and priority of the original task get lost.  Losing the
      affinity and priority can be especially problematic if unbound workqueues
      aren't affined to a small number of CPUs, as zapping multiple roots can
      cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
      the CPUs KVM is already using to run vCPUs.
      
      When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
      zap can result in KVM occupying all logical CPUs for ~8ms, and result in
      high priority tasks not being scheduled in in a timely manner.  In v5.15,
      which doesn't preserve unloaded roots, the issues were even more noticeable
      as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.
      
      Consuming all CPUs for an extended duration can lead to significant jitter
      throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
      is a semi-frequent operation as memslots are deleted and recreated with
      different host virtual addresses to react to host GPU drivers allocating
      and freeing GPU blobs.  On ChromeOS, the jitter manifests as audio blips
      during games due to the audio server's tasks not getting scheduled in
      promptly, despite the tasks having a high realtime priority.
      
      Deleting memslots isn't exactly a fast path and should be avoided when
      possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
      memslot shenanigans, but KVM is squarely in the wrong.  Not to mention
      that removing the async zapping eliminates a non-trivial amount of
      complexity.
      
      Note, one of the subtle behaviors hidden behind the async zapping is that
      KVM would zap invalidated roots only once (ignoring partial zaps from
      things like mmu_notifier events).  Preserve this behavior by adding a flag
      to identify roots that are scheduled to be zapped versus roots that have
      already been zapped but not yet freed.
      
      Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
      encounter invalid roots, as it's not at all obvious why zapping
      invalidated roots shouldn't simply zap all invalid roots.
      Reported-by: default avatarPattara Teerapong <pteerapong@google.com>
      Cc: David Stevens <stevensd@google.com>
      Cc: Yiwei Zhang<zzyiwei@google.com>
      Cc: Paul Hsia <paulhsia@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20230916003916.2545000-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0df9dab8