- 20 Apr, 2023 11 commits
-
-
Will Deacon authored
* for-next/perf: (24 commits) KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege drivers/perf: hisi: add NULL check for name drivers/perf: hisi: Remove redundant initialized of pmu->name perf/arm-cmn: Fix port detection for CMN-700 arm64: pmuv3: dynamically map PERF_COUNT_HW_BRANCH_INSTRUCTIONS perf/arm-cmn: Validate cycles events fully Revert "ARM: mach-virt: Select PMUv3 driver by default" drivers/perf: apple_m1: Add Apple M2 support dt-bindings: arm-pmu: Add PMU compatible strings for Apple M2 cores perf: arm_cspmu: Fix variable dereference warning perf/amlogic: Fix config1/config2 parsing issue drivers/perf: Use devm_platform_get_and_ioremap_resource() kbuild, drivers/perf: remove MODULE_LICENSE in non-modules perf: qcom: Use devm_platform_get_and_ioremap_resource() perf: arm: Use devm_platform_get_and_ioremap_resource() perf/arm-cmn: Move overlapping wp_combine field ARM: mach-virt: Select PMUv3 driver by default ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM ARM: Make CONFIG_CPU_V7 valid for 32bit ARMv8 implementations perf: pmuv3: Change GENMASK to GENMASK_ULL ...
-
Will Deacon authored
Although pKVM supports CPU PMU emulation for non-protected guests since 722625c6 ("KVM: arm64: Reenable pmu in Protected Mode"), this relies on the PMU driver probing before the host has de-privileged so that the 'kvm_arm_pmu_available' static key can still be enabled by patching the hypervisor text. As it happens, both of these events hang off device_initcall() but the PMU consistently won the race until 7755cec6 ("arm64: perf: Move PMUv3 driver to drivers/perf"). Since then, the host will fail to boot when pKVM is enabled: | hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 counters available | kvm [1]: nVHE hyp BUG at: [<ffff8000090366e0>] __kvm_nvhe_handle_host_mem_abort+0x270/0x284! | kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE | kvm [1]: Hyp Offset: 0xfffea41fbdf70000 | Kernel panic - not syncing: HYP panic: | PS:a00003c9 PC:0000dbe04b0c66e0 ESR:00000000f2000800 | FAR:fffffbfffddfcf00 HPFAR:00000000010b0bf0 PAR:0000000000000000 | VCPU:0000000000000000 | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc7-00083-g0bce6746d154 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x13c/0x33c | nvhe_hyp_panic_handler+0x10c/0x190 | aarch64_insn_patch_text_nosync+0x64/0xc8 | arch_jump_label_transform+0x4c/0x5c | __jump_label_update+0x84/0xfc | jump_label_update+0x100/0x134 | static_key_enable_cpuslocked+0x68/0xac | static_key_enable+0x20/0x34 | kvm_host_pmu_init+0x88/0xa4 | armpmu_register+0xf0/0xf4 | arm_pmu_acpi_probe+0x2ec/0x368 | armv8_pmu_driver_init+0x38/0x44 | do_one_initcall+0xcc/0x240 Fix the race properly by deferring the de-privilege step to device_initcall_sync(). This will also be needed in future when probing IOMMU devices and allows us to separate the pKVM de-privilege logic from the core hypervisor initialisation path. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Fixes: 7755cec6 ("arm64: perf: Move PMUv3 driver to drivers/perf") Tested-by: Fuad Tabba <tabba@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230420123356.2708-1-will@kernel.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
* for-next/mm: arm64: mm: always map fixmap at page granularity arm64: mm: move fixmap code to its own file arm64: add FIXADDR_TOT_{START,SIZE} Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"" arm: uaccess: Remove memcpy_page_flushcache() mm,kfence: decouple kfence from page granularity mapping judgement
-
Will Deacon authored
* for-next/misc: arm64: kexec: include reboot.h arm64: delete dead code in this_cpu_set_vectors() arm64: kernel: Fix kernel warning when nokaslr is passed to commandline arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step arm64/sme: Fix some comments of ARM SME arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2() arm64/signal: Use system_supports_tpidr2() to check TPIDR2 arm64: compat: Remove defines now in asm-generic arm64: kexec: remove unnecessary (void*) conversions arm64: armv8_deprecated: remove unnecessary (void*) conversions firmware: arm_sdei: Fix sleep from invalid context BUG
-
Will Deacon authored
* for-next/kdump: arm64: kdump: defer the crashkernel reservation for platforms with no DMA memory zones arm64: kdump: do not map crashkernel region specifically arm64: kdump : take off the protection on crashkernel memory region
-
Will Deacon authored
* for-next/ftrace: arm64: ftrace: Simplify get_ftrace_plt arm64: ftrace: Add direct call support ftrace: selftest: remove broken trace_direct_tramp ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS ftrace: Store direct called addresses in their ops ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs ftrace: Remove the legacy _ftrace_direct API ftrace: Replace uses of _ftrace_direct APIs with _ftrace_direct_multi ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()
-
Will Deacon authored
* for-next/cpufeature: arm64/cpufeature: Use helper macro to specify ID register for capabilites arm64/cpufeature: Consistently use symbolic constants for min_field_value arm64/cpufeature: Pull out helper for CPUID register definitions
-
Will Deacon authored
* for-next/asm: arm64: uaccess: remove unnecessary earlyclobber arm64: uaccess: permit put_{user,kernel} to use zero register arm64: uaccess: permit __smp_store_release() to use zero register arm64: atomics: lse: improve cmpxchg implementation
-
Will Deacon authored
* for-next/acpi: ACPI: AGDI: Improve error reporting for problems during .remove()
-
Simon Horman authored
Include reboot.h in machine_kexec.c for declaration of machine_crash_shutdown. gcc-12 with W=1 reports: arch/arm64/kernel/machine_kexec.c:257:6: warning: no previous prototype for 'machine_crash_shutdown' [-Wmissing-prototypes] 257 | void machine_crash_shutdown(struct pt_regs *regs) No functional changes intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230418-arm64-kexec-include-reboot-v1-1-8453fd4fb3fb@kernel.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Dan Carpenter authored
The "slot" variable is an enum, and in this context it is an unsigned int. So the type means it can never be negative and also we never pass invalid data to this function. If something did pass invalid data then this check would be insufficient protection. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/73859c9e-dea0-4764-bf01-7ae694fa2e37@kili.mountainSigned-off-by: Will Deacon <will@kernel.org>
-
- 17 Apr, 2023 6 commits
-
-
Mark Brown authored
When defining which value to look for in a system register field we currently manually specify the register, field shift, width and sign and the value to look for. This opens the potential for error with for example the wrong field width or sign being specified, an enumeration value for a different similarly named field or letting something be initialised to 0. Since we now generate defines for all the ID registers we now have named constants for all of these things generated from the system register description, meaning that we can generate initialisation for all the fields used in matching from a minimal specification of register, field and match value. This is both shorter and eliminates or makes build failures several potential errors. No change in the generated binary. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-3-4c8f28a6f203@kernel.org [will: Drop explicit '.sign' assignment for BTI feature] Signed-off-by: Will Deacon <will@kernel.org>
-
Junhao He authored
When allocations fails that can be NULL now. If the name provided is NULL, then the initialization process of the PMU type and dev will be skipped in function perf_pmu_register(). Consequently, the PMU will not be able to register into the kernel. Moreover, in the case of unregister the PMU, the function device_del() will need to handle NULL pointers, which potentially can cause issues. So move this allocation above the cpuhp_state_add_instance() and directly return if it does fail. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230403081423.62460-3-hejunhao3@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
Junhao He authored
"pmu->name" is initialized by perf_pmu_register() function, so remove the redundant initialized in hisi_pmu_init(). Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230403081423.62460-2-hejunhao3@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Brown authored
A number of the cpufeatures use raw numbers for the minimum field values specified rather than symbolic constants. In preparation for the use of helper macros replace all these with the appropriate constants. No change in the generated binary. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-2-4c8f28a6f203@kernel.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Brown authored
We use the same structure to match hwcaps and CPU features so we can use the same helper to generate the fields required. Pull the portion of the current hwcaps helper that initialises the fields out into a separate define placed earlier in the file so we can use it for cpufeatures. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-1-4c8f28a6f203@kernel.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Uwe Kleine-König authored
Returning an error value in a platform driver's remove callback results in a generic error message being emitted by the driver core, but otherwise it doesn't make a difference. The device goes away anyhow. So instead of triggering the generic platform error message, emit a more helpful message if a problem occurs and return 0 to suppress the generic message. This patch is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Link: https://lore.kernel.org/r/20221014160623.467195-1-u.kleine-koenig@pengutronix.deSigned-off-by: Will Deacon <will@kernel.org>
-
- 14 Apr, 2023 3 commits
-
-
Pavankumar Kondeti authored
'Unknown kernel command line parameters "nokaslr", will be passed to user space' message is noticed in the dmesg when nokaslr is passed to the kernel commandline on ARM64 platform. This is because nokaslr param is handled by early cpufeature detection infrastructure and the parameter is never consumed by a kernel param handler. Fix this warning by providing a dummy kernel param handler for nokaslr. Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Link: https://lore.kernel.org/r/20230412043258.397455-1-quic_pkondeti@quicinc.comSigned-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
When the "extra device ports" configuration was first added, the additional mxp_device_port_connect_info registers were added around the existing mxp_mesh_port_connect_info registers. What I missed about CMN-700 is that it shuffled them around to remove this discontinuity. As such, tweak the definitions and factor out a helper for reading these registers so we can deal with this discrepancy easily, which does at least allow nicely tidying up the callsites. With this we can then also do the nice thing and skip accesses completely rather than relying on RES0 behaviour where we know the extra registers aren't defined. Fixes: 23760a01 ("perf/arm-cmn: Add CMN-700 support") Reported-by: Jing Zhang <renyu.zj@linux.alibaba.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/71d129241d4d7923cde72a0e5b4c8d2f6084525f.1681295193.git.robin.murphy@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Sumit Garg authored
Currently only the first attempt to single-step has any effect. After that all further stepping remains "stuck" at the same program counter value. Refer to the ARM Architecture Reference Manual (ARM DDI 0487E.a) D2.12, PSTATE.SS=1 should be set at each step before transferring the PE to the 'Active-not-pending' state. The problem here is PSTATE.SS=1 is not set since the second single-step. After the first single-step, the PE transferes to the 'Inactive' state, with PSTATE.SS=0 and MDSCR.SS=1, thus PSTATE.SS won't be set to 1 due to kernel_active_single_step()=true. Then the PE transferes to the 'Active-pending' state when ERET and returns to the debugger by step exception. Before this patch: ================== Entering kdb (current=0xffff3376039f0000, pid 1) on processor 0 due to Keyboard Entry [0]kdb> [0]kdb> [0]kdb> bp write_sysrq_trigger Instruction(i) BP #0 at 0xffffa45c13d09290 (write_sysrq_trigger) is enabled addr at ffffa45c13d09290, hardtype=0 installed=0 [0]kdb> go $ echo h > /proc/sysrq-trigger Entering kdb (current=0xffff4f7e453f8000, pid 175) on processor 1 due to Breakpoint @ 0xffffad651a309290 [1]kdb> ss Entering kdb (current=0xffff4f7e453f8000, pid 175) on processor 1 due to SS trap @ 0xffffad651a309294 [1]kdb> ss Entering kdb (current=0xffff4f7e453f8000, pid 175) on processor 1 due to SS trap @ 0xffffad651a309294 [1]kdb> After this patch: ================= Entering kdb (current=0xffff6851c39f0000, pid 1) on processor 0 due to Keyboard Entry [0]kdb> bp write_sysrq_trigger Instruction(i) BP #0 at 0xffffc02d2dd09290 (write_sysrq_trigger) is enabled addr at ffffc02d2dd09290, hardtype=0 installed=0 [0]kdb> go $ echo h > /proc/sysrq-trigger Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to Breakpoint @ 0xffffc02d2dd09290 [1]kdb> ss Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to SS trap @ 0xffffc02d2dd09294 [1]kdb> ss Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to SS trap @ 0xffffc02d2dd09298 [1]kdb> ss Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to SS trap @ 0xffffc02d2dd0929c [1]kdb> Fixes: 44679a4f ("arm64: KGDB: Add step debugging support") Co-developed-by: Wei Li <liwei391@huawei.com> Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Tested-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20230202073148.657746-3-sumit.garg@linaro.orgSigned-off-by: Will Deacon <will@kernel.org>
-
- 12 Apr, 2023 3 commits
-
-
Dongxu Sun authored
When TIF_SME is clear, fpsimd_restore_current_state will disable SME trap during ret_to_user, then SME access trap is impossible in userspace, not SVE. Besides, fix typo: alocated->allocated. Signed-off-by: Dongxu Sun <sundongxu3@huawei.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230317124915.1263-5-sundongxu3@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
Dongxu Sun authored
Move tpidr2 sigframe allocation from under the checking of system_supports_sme() to the checking of system_supports_tpidr2(). Signed-off-by: Dongxu Sun <sundongxu3@huawei.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230317124915.1263-3-sundongxu3@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
Dongxu Sun authored
Since commit a9d69158("arm64/sme: Implement support for TPIDR2"), We introduced system_supports_tpidr2() for TPIDR2 handling. Let's use the specific check instead. No functional changes. Signed-off-by: Dongxu Sun <sundongxu3@huawei.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230317124915.1263-2-sundongxu3@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
- 11 Apr, 2023 11 commits
-
-
Baoquan He authored
In commit 03149563 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones"), reserve_crashkernel() is called much earlier in arm64_memblock_init() to avoid causing base apge mapping on platforms with no DMA meomry zones. With taking off protection on crashkernel memory region, no need to call reserve_crashkernel() specially in advance. The deferred invocation of reserve_crashkernel() in bootmem_init() can cover all cases. So revert the whole commit now. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230407011507.17572-4-bhe@redhat.comSigned-off-by: Will Deacon <will@kernel.org>
-
Baoquan He authored
After taking off the protection functions on crashkernel memory region, there's no need to map crashkernel region with page granularity during linear mapping. With this change, the system can make use of block or section mapping on linear region to largely improve perforcemence during system bootup and running. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230407011507.17572-3-bhe@redhat.comSigned-off-by: Will Deacon <will@kernel.org>
-
Baoquan He authored
Problem: ======= On arm64, block and section mapping is supported to build page tables. However, currently it enforces to take base page mapping for the whole linear mapping if CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled and crashkernel kernel parameter is set. This will cause longer time of the linear mapping process during bootup and severe performance degradation during running time. Root cause: ========== On arm64, crashkernel reservation relies on knowing the upper limit of low memory zone because it needs to reserve memory in the zone so that devices' DMA addressing in kdump kernel can be satisfied. However, the upper limit of low memory on arm64 is variant. And the upper limit can only be decided late till bootmem_init() is called [1]. And we need to map the crashkernel region with base page granularity when doing linear mapping, because kdump needs to protect the crashkernel region via set_memory_valid(,0) after kdump kernel loading. However, arm64 doesn't support well on splitting the built block or section mapping due to some cpu reststriction [2]. And unfortunately, the linear mapping is done before bootmem_init(). To resolve the above conflict on arm64, the compromise is enforcing to take base page mapping for the entire linear mapping if crashkernel is set, and CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabed. Hence performance is sacrificed. Solution: ========= Comparing with the base page mapping for the whole linear region, it's better to take off the protection on crashkernel memory region for the time being because the anticipated stamping on crashkernel memory region could only happen in a chance in one million, while the base page mapping for the whole linear region is mitigating arm64 systems with crashkernel set always. [1] https://lore.kernel.org/all/YrIIJkhKWSuAqkCx@arm.com/T/#u [2] https://lore.kernel.org/linux-arm-kernel/20190911182546.17094-1-nsaenzjulienne@suse.de/T/Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230407011507.17572-2-bhe@redhat.comSigned-off-by: Will Deacon <will@kernel.org>
-
Teo Couprie Diaz authored
Some generic COMPAT definitions have been consolidated in asm-generic/compat.h by commit 84a0c977 ("asm-generic: compat: Cleanup duplicate definitions") Remove those that are already defined to the same value there from arm64 asm/compat.h. Signed-off-by: Teo Couprie Diaz <teo.coupriediaz@arm.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230314140038.252908-1-teo.coupriediaz@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Rutland authored
Today the fixmap code largely maps elements at PAGE_SIZE granularity, but we special-case the FDT mapping such that it can be mapped with 2M block mappings when 4K pages are in use. The original rationale for this was simplicity, but it has some unfortunate side-effects, and complicates portions of the fixmap code (i.e. is not so simple after all). The FDT can be up to 2M in size but is only required to have 8-byte alignment, and so it may straddle a 2M boundary. Thus when using 2M block mappings we may map up to 4M of memory surrounding the FDT. This is unfortunate as most of that memory will be unrelated to the FDT, and any pages which happen to share a 2M block with the FDT will by mapped with Normal Write-Back Cacheable attributes, which might not be what we want elsewhere (e.g. for carve-outs using Non-Cacheable attributes). The logic to handle mapping the FDT with 2M blocks requires some special cases in the fixmap code, and ties it to the early page table configuration by virtue of the SWAPPER_TABLE_SHIFT and SWAPPER_BLOCK_SIZE constants used to determine the granularity used to map the FDT. This patch simplifies the FDT logic and removes the unnecessary mappings of surrounding pages by always mapping the FDT at page granularity as with all other fixmap mappings. To do so we statically reserve multiple PTE tables to cover the fixmap VA range. Since the FDT can be at most 2M, for 4K pages we only need to allocate a single additional PTE table, and for 16K and 64K pages the existing single PTE table is sufficient. The PTE table allocation scales with the number of slots reserved in the fixmap, and so this also makes it easier to add more fixmap entries if we require those in future. Our VA layout means that the fixmap will always fall within a single PMD table (and consequently, within a single PUD/P4D/PGD entry), which we can verify at compile time with a static_assert(). With that assert a number of runtime warnings become impossible, and are removed. I've boot-tested this patch with both 4K and 64K pages. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-4-mark.rutland@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Rutland authored
Over time, arm64's mm/mmu.c has become increasingly large and painful to navigate. Move the fixmap code to its own file where it can be understood in isolation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-3-mark.rutland@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Rutland authored
Currently arm64's FIXADDR_{START,SIZE} definitions only cover the runtime fixmap slots (and not the boot-time fixmap slots), but the code for creating the fixmap assumes that these definitions cover the entire fixmap range. This means that the ptdump boundaries are reported in a misleading way, missing the VA region of the runtime slots. In theory this could also cause the fixmap creation to go wrong if the boot-time fixmap slots end up spilling into a separate PMD entry, though luckily this is not currently the case in any configuration. While it seems like we could extend FIXADDR_{START,SIZE} to cover the entire fixmap area, core code relies upon these *only* covering the runtime slots. For example, fix_to_virt() and virt_to_fix() try to reject manipulation of the boot-time slots based upon FIXADDR_{START,SIZE}, while __fix_to_virt() and __virt_to_fix() can handle any fixmap slot. This patch follows the lead of x86 in commit: 55f49fcb ("x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP") ... and add new FIXADDR_TOT_{START,SIZE} definitions which cover the entire fixmap area, using these for the fixmap creation and ptdump code. As the boot-time fixmap slots are now rejected by fix_to_virt(), the early_fixmap_init() code is changed to consistently use __fix_to_virt(), as it already does in a few cases. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-2-mark.rutland@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Florent Revest authored
Following recent refactorings, the get_ftrace_plt function only ever gets called with addr = FTRACE_ADDR so its code can be simplified to always return the ftrace trampoline plt. Signed-off-by: Florent Revest <revest@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230405180250.2046566-3-revest@chromium.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Florent Revest authored
This builds up on the CALL_OPS work which extends the ftrace patchsite on arm64 with an ops pointer usable by the ftrace trampoline. This ops pointer is valid at all time. Indeed, it is either pointing to ftrace_list_ops or to the single ops which should be called from that patchsite. There are a few cases to distinguish: - If a direct call ops is the only one tracing a function: - If the direct called trampoline is within the reach of a BL instruction -> the ftrace patchsite jumps to the trampoline - Else -> the ftrace patchsite jumps to the ftrace_caller trampoline which reads the ops pointer in the patchsite and jumps to the direct call address stored in the ops - Else -> the ftrace patchsite jumps to the ftrace_caller trampoline and its ops literal points to ftrace_list_ops so it iterates over all registered ftrace ops, including the direct call ops and calls its call_direct_funcs handler which stores the direct called trampoline's address in the ftrace_regs and the ftrace_caller trampoline will return to that address instead of returning to the traced function Signed-off-by: Florent Revest <revest@chromium.org> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230405180250.2046566-2-revest@chromium.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
Merge tag 'trace-direct-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace into for-next/ftrace Pull in ftrace trampoline updates from Steve so that we can implement support for direct calls for arm64 on top: tracing: Direct trampoline updates Updates to the direct trampoline to allow ARM64 to have direct trampolines.
-
Stephane Eranian authored
The mapping of perf_events generic hardware events to actual PMU events on ARM PMUv3 may not always be correct. This is in particular true for the PERF_COUNT_HW_BRANCH_INSTRUCTIONS event. Although the mapping points to an architected event, it may not always be available. This can be seen with a simple: $ perf stat -e branches sleep 0 Performance counter stats for 'sleep 0': <not supported> branches 0.001401081 seconds time elapsed Yet the hardware does have an event that could be used for branches. Dynamically check for a supported hardware event which can be used for PERF_COUNT_HW_BRANCH_INSTRUCTIONS at mapping time. And with that: $ perf stat -e branches sleep 0 Performance counter stats for 'sleep 0': 166,739 branches 0.000832163 seconds time elapsed Co-developed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Co-developed-by: Peter Newman <peternewman@google.com> Signed-off-by: Peter Newman <peternewman@google.com> Link: https://lore.kernel.org/all/YvunKCJHSXKz%2FkZB@FVFF77S0Q05N Link: https://lore.kernel.org/r/20230411093809.657501-1-peternewman@google.comSigned-off-by: Will Deacon <will@kernel.org>
-
- 06 Apr, 2023 1 commit
-
-
Robin Murphy authored
DTC cycle count events don't have anything to validate or initialise in themselves, but we should not forget to still validate their whole group context. Otherwise, we may fail to correctly reject a contrived group containing an impossible number of cycles events. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/3124e8c276a1f513c1a415dc839ca4181b3c8bc8.1680522545.git.robin.murphy@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
- 30 Mar, 2023 1 commit
-
-
Will Deacon authored
This reverts commit b7d9aae4. With the Qualcomm remoteproc driver now modified to use a carveout memory region in 57f72170 ("remoteproc: qcom_q6v5_mss: Use a carveout to authenticate modem headers"), we can reinstate c44094ee ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()") which relaxes the arm64 implementation of arch_dma_prep_coherent() to perform only a data cache clean operation, rather than a clean-and-invalidate. Signed-off-by: Will Deacon <will@kernel.org>
-
- 28 Mar, 2023 4 commits
-
-
Mark Rutland authored
Currently the asm constraints for __get_mem_asm() mark the value register as an earlyclobber operand. This means that the compiler can't reuse the same register for both the address and value, even when the value is not subsequently used. There's no need for the value register to be marked as earlyclobber, as it's only written to after the address register is consumed, even when the access faults. Remove the unnecessary earlyclobber. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230314153700.787701-5-mark.rutland@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Rutland authored
Currently the asm constraints for __put_mem_asm() require that the value is placed in a "real" GPR (i.e. one other than [XW]ZR or SP). This means that for cases such as: __put_user(0, addr) ... the compiler has to move '0' into "real" GPR, e.g. mov xN, #0 sttr xN, [<addr>] This is unfortunate, as using the zero register would require fewer instructions and save a "real" GPR for other usage, allowing the compiler to generate: sttr xzr, [<addr>] Modify the asm constaints for __put_mem_asm() to permit the use of the zero register for the value. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230314153700.787701-4-mark.rutland@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Rutland authored
Currently the asm constraints for __smp_store_release() require that the value is placed in a "real" GPR (i.e. one other than [XW]ZR or SP). This means that for cases such as: __smp_store_release(ptr, 0) ... the compiler has to move '0' into "real" GPR, e.g. mov xN, #0 stlr xN, [<addr>] This is unfortunate, as using the zero register would require fewer instructions and save a "real" GPR for other usage, allowing the compiler to generate: stlr xzr, [<addr>] Modify the asm constaints for __smp_store_release() to permit the use of the zero register for the value. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230314153700.787701-3-mark.rutland@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Mark Rutland authored
For historical reasons, the LSE implementation of cmpxchg*() hard-codes the GPRs to use, and shuffles registers around with MOVs. This is no longer necessary, and can be simplified. When the LSE cmpxchg implementation was added in commit: c342f782 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU") ... the LL/SC implementation of cmpxchg() would be placed out-of-line, and the in-line assembly for cmpxchg would default to: NOP BL <ll_sc_cmpxchg*_implementation> NOP The LL/SC implementation of each cmpxchg() function accepted arguments as per AAPCS64 rules, to it was necessary to place the pointer in x0, the older value in X1, and the new value in x2, and acquire the return value from x0. The LL/SC implementation required a temporary register (e.g. for the STXR status value). As the LL/SC implementation preserved the old value, the LSE implementation does likewise. Since commit: addfc386 ("arm64: atomics: avoid out-of-line ll/sc atomics") ... the LSE and LL/SC implementations of cmpxchg are inlined as separate asm blocks, with another branch choosing between thw two. Due to this, it is no longer necessary for the LSE implementation to match the register constraints of the LL/SC implementation. This was partially dealt with by removing the hard-coded use of x30 in commit: 3337cb5a ("arm64: avoid using hard-coded registers for LSE atomics") ... but we didn't clean up the hard-coding of x0, x1, and x2. This patch simplifies the LSE implementation of cmpxchg, removing the register shuffling and directly clobbering the 'old' argument. This gives the compiler greater freedom for register allocation, and avoids redundant work. The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to the same register when the initial values of the two are the same, e.g. resulting in: CAS X0, X0, [X1] This is safe as Rs is only written back after the initial values of Rs and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid when Rs == Rt. The new constraints also permit 'new' to be allocated to the zero register, avoiding a MOV in a few cases. The same cannot be done for 'old' as it is both an input and output, and any caller of cmpxchg() should care about the output value. Note that for CAS* the use of the zero register never affects the ordering (while for SWP* the use of the zero regsiter for the 'old' value drops any ACQUIRE semantic). Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the resulting Image is the same size due to internal alignment and padding: [mark@lakrids:~/src/linux]% ls -al vmlinux-* -rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after -rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before [mark@lakrids:~/src/linux]% ls -al Image-* -rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after -rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before This patch does not touch cmpxchg_double*() as that requires contiguous register pairs, and separate patches will replace it with cmpxchg128*(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230314153700.787701-2-mark.rutland@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-