An error occurred fetching the project authors.
  1. 26 Apr, 2024 1 commit
  2. 18 Mar, 2024 1 commit
    • Christoph Lameter (Ampere)'s avatar
      ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 · 3fbd56f0
      Christoph Lameter (Ampere) authored
        [ a.k.a. Revert "Revert "ARM64: Dynamically allocate cpumasks and
          increase supported CPUs to 512""; originally reverted because of a
          bug in the cpufreq-dt code not using zalloc_cpumask_var() ]
      
      Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
      Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
      these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
      Therefore this patch increases the default NR_CPUS from 256 to 512.
      
      As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
      this might have a significant impact on stack usage due to code which places
      cpumasks on the stack. To mitigate that concern, we can select
      CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
      NR_CPUS=256, we only select this when NR_CPUS > 256.
      
      CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
      dynamically allocated. This was used in the X86 architecture in the
      past to enable support for larger CPU configurations up to 8k cpus.
      
      With that is becomes possible to dynamically size the allocation of
      the cpu bitmaps depending on the quantity of processors detected on
      bootup. Memory used for cpumasks will increase if the kernel is
      run on a machine with more cores.
      
      Further increases may be needed if ARM processor vendors start
      supporting more processors. Given the current inflationary trends
      in core counts from multiple processor manufacturers this may occur.
      
      There are minor regressions for hackbench. The kernel data size
      for 512 cpus is smaller with offstack than with onstack.
      
      Benchmark results using hackbench average over 10 runs of
      
       	hackbench -s 512 -l 2000 -g 15 -f 25 -P
      
      on Altra 80 Core
      
      Support for 256 CPUs on stack. Baseline
      
       	7.8564 sec
      
      Support for 512 CUs on stack.
      
       	7.8713 sec + 0.18%
      
      512 CPUS offstack
      
       	7.8916 sec + 0.44%
      
      Kernel size comparison:
      
          text		   data	    filename				Difference to onstack256 baseline
      25755648	9589248	    vmlinuz-6.8.0-rc4-onstack256
      25755648	9607680	    vmlinuz-6.8.0-rc4-onstack512	+0.19%
      25755648	9603584	    vmlinuz-6.8.0-rc4-offstack512	+0.14%
      Tested-by: default avatarEric Mackay <eric.mackay@oracle.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarChristoph Lameter (Ampere) <cl@linux.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org
      Link: https://lore.kernel.org/r/20240314125457.186678-1-m.szyprowski@samsung.com
      [catalin.marinas@arm.com: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK']
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      3fbd56f0
  3. 13 Mar, 2024 1 commit
  4. 11 Mar, 2024 1 commit
  5. 07 Mar, 2024 1 commit
    • Christoph Lameter (Ampere)'s avatar
      ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 · 0499a783
      Christoph Lameter (Ampere) authored
      Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
      Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
      these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
      Therefore this patch increases the default NR_CPUS from 256 to 512.
      
      As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
      this might have a significant impact on stack usage due to code which places
      cpumasks on the stack. To mitigate that concern, we can select
      CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
      NR_CPUS=256, we only select this when NR_CPUS > 256.
      
      CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
      dynamically allocated. This was used in the X86 architecture in the
      past to enable support for larger CPU configurations up to 8k cpus.
      
      With that is becomes possible to dynamically size the allocation of
      the cpu bitmaps depending on the quantity of processors detected on
      bootup. Memory used for cpumasks will increase if the kernel is
      run on a machine with more cores.
      
      Further increases may be needed if ARM processor vendors start
      supporting more processors. Given the current inflationary trends
      in core counts from multiple processor manufacturers this may occur.
      
      There are minor regressions for hackbench. The kernel data size
      for 512 cpus is smaller with offstack than with onstack.
      
      Benchmark results using hackbench average over 10 runs of
      
       	hackbench -s 512 -l 2000 -g 15 -f 25 -P
      
      on Altra 80 Core
      
      Support for 256 CPUs on stack. Baseline
      
       	7.8564 sec
      
      Support for 512 CUs on stack.
      
       	7.8713 sec + 0.18%
      
      512 CPUS offstack
      
       	7.8916 sec + 0.44%
      
      Kernel size comparison:
      
          text		   data	    filename				Difference to onstack256 baseline
      25755648	9589248	    vmlinuz-6.8.0-rc4-onstack256
      25755648	9607680	    vmlinuz-6.8.0-rc4-onstack512	+0.19%
      25755648	9603584	    vmlinuz-6.8.0-rc4-offstack512	+0.14%
      Tested-by: default avatarEric Mackay <eric.mackay@oracle.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarChristoph Lameter (Ampere) <cl@linux.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org
      [catalin.marinas@arm.com: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK']
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      0499a783
  6. 06 Mar, 2024 1 commit
  7. 28 Feb, 2024 1 commit
  8. 24 Feb, 2024 1 commit
    • Baoquan He's avatar
      kexec: split crashkernel reservation code out from crash_core.c · 85fcde40
      Baoquan He authored
      Patch series "Split crash out from kexec and clean up related config
      items", v3.
      
      Motivation:
      =============
      Previously, LKP reported a building error. When investigating, it can't
      be resolved reasonablly with the present messy kdump config items.
      
       https://lore.kernel.org/oe-kbuild-all/202312182200.Ka7MzifQ-lkp@intel.com/
      
      The kdump (crash dumping) related config items could causes confusions:
      
      Firstly,
      
      CRASH_CORE enables codes including
       - crashkernel reservation;
       - elfcorehdr updating;
       - vmcoreinfo exporting;
       - crash hotplug handling;
      
      Now fadump of powerpc, kcore dynamic debugging and kdump all selects
      CRASH_CORE, while fadump
       - fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
         global variable 'elfcorehdr_addr';
       - kcore only needs vmcoreinfo exporting;
       - kdump needs all of the current kernel/crash_core.c.
      
      So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
      mislead people that we enable crash dumping, actual it's not.
      
      Secondly,
      
      It's not reasonable to allow KEXEC_CORE select CRASH_CORE.
      
      Because KEXEC_CORE enables codes which allocate control pages, copy
      kexec/kdump segments, and prepare for switching. These codes are
      shared by both kexec reboot and kdump. We could want kexec reboot,
      but disable kdump. In that case, CRASH_CORE should not be selected.
      
       --------------------
       CONFIG_CRASH_CORE=y
       CONFIG_KEXEC_CORE=y
       CONFIG_KEXEC=y
       CONFIG_KEXEC_FILE=y
       ---------------------
      
      Thirdly,
      
      It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.
      
      That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
      KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
      code built in doesn't make any sense because no kernel loading or
      switching will happen to utilize the KEXEC_CORE code.
       ---------------------
       CONFIG_CRASH_CORE=y
       CONFIG_KEXEC_CORE=y
       CONFIG_CRASH_DUMP=y
       ---------------------
      
      In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
      while CRASH_DUMP can still be enabled when !MMU, then compiling error is
      seen as the lkp test robot reported in above link.
      
       ------arch/sh/Kconfig------
       config ARCH_SUPPORTS_KEXEC
               def_bool MMU
      
       config ARCH_SUPPORTS_CRASH_DUMP
               def_bool BROKEN_ON_SMP
       ---------------------------
      
      Changes:
      ===========
      1, split out crash_reserve.c from crash_core.c;
      2, split out vmcore_infoc. from crash_core.c;
      3, move crash related codes in kexec_core.c into crash_core.c;
      4, remove dependency of FA_DUMP on CRASH_DUMP;
      5, clean up kdump related config items;
      6, wrap up crash codes in crash related ifdefs on all 8 arch-es
         which support crash dumping, except of ppc;
      
      Achievement:
      ===========
      With above changes, I can rearrange the config item logic as below (the right
      item depends on or is selected by the left item):
      
          PROC_KCORE -----------> VMCORE_INFO
      
                     |----------> VMCORE_INFO
          FA_DUMP----|
                     |----------> CRASH_RESERVE
      
                                                          ---->VMCORE_INFO
                                                         /
                                                         |---->CRASH_RESERVE
          KEXEC      --|                                /|
                       |--> KEXEC_CORE--> CRASH_DUMP-->/-|---->PROC_VMCORE
          KEXEC_FILE --|                               \ |
                                                         \---->CRASH_HOTPLUG
      
      
          KEXEC      --|
                       |--> KEXEC_CORE (for kexec reboot only)
          KEXEC_FILE --|
      
      Test
      ========
      On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
      riscv, loongarch, I did below three cases of config item setting and
      building all passed. Take configs on x86_64 as exampmle here:
      
      (1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
      items are unset automatically:
      # Kexec and crash features
      # CONFIG_KEXEC is not set
      # CONFIG_KEXEC_FILE is not set
      # end of Kexec and crash features
      
      (2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
      ---------------
      # Kexec and crash features
      CONFIG_CRASH_RESERVE=y
      CONFIG_VMCORE_INFO=y
      CONFIG_KEXEC_CORE=y
      CONFIG_KEXEC_FILE=y
      CONFIG_CRASH_DUMP=y
      CONFIG_CRASH_HOTPLUG=y
      CONFIG_CRASH_MAX_MEMORY_RANGES=8192
      # end of Kexec and crash features
      ---------------
      
      (3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':
      ------------------------
      # Kexec and crash features
      CONFIG_KEXEC_CORE=y
      CONFIG_KEXEC_FILE=y
      # end of Kexec and crash features
      ------------------------
      
      Note:
      For ppc, it needs investigation to make clear how to split out crash
      code in arch folder. Hope Hari and Pingfan can help have a look, see if
      it's doable. Now, I make it either have both kexec and crash enabled, or
      disable both of them altogether.
      
      
      This patch (of 14):
      
      Both kdump and fa_dump of ppc rely on crashkernel reservation.  Move the
      relevant codes into separate files: crash_reserve.c,
      include/linux/crash_reserve.h.
      
      And also add config item CRASH_RESERVE to control its enabling of the
      codes.  And update config items which has relationship with crashkernel
      reservation.
      
      And also change ifdeffery from CONFIG_CRASH_CORE to CONFIG_CRASH_RESERVE
      when those scopes are only crashkernel reservation related.
      
      And also rename arch/XXX/include/asm/{crash_core.h => crash_reserve.h} on
      arm64, x86 and risc-v because those architectures' crash_core.h is only
      related to crashkernel reservation.
      
      [akpm@linux-foundation.org: s/CRASH_RESEERVE/CRASH_RESERVE/, per Klara Modin]
      Link: https://lkml.kernel.org/r/20240124051254.67105-1-bhe@redhat.com
      Link: https://lkml.kernel.org/r/20240124051254.67105-2-bhe@redhat.comSigned-off-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pingfan Liu <piliu@redhat.com>
      Cc: Klara Modin <klarasmodin@gmail.com>
      Cc: Michael Kelley <mhklinux@outlook.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Yang Li <yang.lee@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      85fcde40
  9. 22 Feb, 2024 3 commits
    • Nathan Chancellor's avatar
      arm64: Kconfig: clean up tautological LLVM version checks · 634e4ff9
      Nathan Chancellor authored
      Now that the minimum supported version of LLVM for building the kernel has
      been bumped to 13.0.1, several conditions become tautologies, as they will
      always be true because the build will fail during the configuration stage
      for older LLVM versions.  Drop them, as they are unnecessary.
      
      Link: https://lkml.kernel.org/r/20240125-bump-min-llvm-ver-to-13-0-1-v1-5-f5ff9bda41c5@kernel.orgSigned-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Conor Dooley <conor@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nicolas Schier <nicolas@fjasle.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      634e4ff9
    • Nathan Chancellor's avatar
      arch and include: update LLVM Phabricator links · fafdea34
      Nathan Chancellor authored
      reviews.llvm.org was LLVM's Phabricator instances for code review.  It has
      been abandoned in favor of GitHub pull requests.  While the majority of
      links in the kernel sources still work because of the work Fangrui has
      done turning the dynamic Phabricator instance into a static archive, there
      are some issues with that work, so preemptively convert all the links in
      the kernel sources to point to the commit on GitHub.
      
      Most of the commits have the corresponding differential review link in the
      commit message itself so there should not be any loss of fidelity in the
      relevant information.
      
      Link: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/172
      Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-2-eb09b59db071@kernel.orgSigned-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarFangrui Song <maskray@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Mykola Lysenko <mykolal@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fafdea34
    • Ryan Roberts's avatar
      arm64/mm: wire up PTE_CONT for user mappings · 4602e575
      Ryan Roberts authored
      With the ptep API sufficiently refactored, we can now introduce a new
      "contpte" API layer, which transparently manages the PTE_CONT bit for user
      mappings.
      
      In this initial implementation, only suitable batches of PTEs, set via
      set_ptes(), are mapped with the PTE_CONT bit.  Any subsequent modification
      of individual PTEs will cause an "unfold" operation to repaint the contpte
      block as individual PTEs before performing the requested operation. 
      While, a modification of a single PTE could cause the block of PTEs to
      which it belongs to become eligible for "folding" into a contpte entry,
      "folding" is not performed in this initial implementation due to the costs
      of checking the requirements are met.  Due to this, contpte mappings will
      degrade back to normal pte mappings over time if/when protections are
      changed.  This will be solved in a future patch.
      
      Since a contpte block only has a single access and dirty bit, the semantic
      here changes slightly; when getting a pte (e.g.  ptep_get()) that is part
      of a contpte mapping, the access and dirty information are pulled from the
      block (so all ptes in the block return the same access/dirty info).  When
      changing the access/dirty info on a pte (e.g.  ptep_set_access_flags())
      that is part of a contpte mapping, this change will affect the whole
      contpte block.  This is works fine in practice since we guarantee that
      only a single folio is mapped by a contpte block, and the core-mm tracks
      access/dirty information per folio.
      
      In order for the public functions, which used to be pure inline, to
      continue to be callable by modules, export all the contpte_* symbols that
      are now called by those public inline functions.
      
      The feature is enabled/disabled with the ARM64_CONTPTE Kconfig parameter
      at build time.  It defaults to enabled as long as its dependency,
      TRANSPARENT_HUGEPAGE is also enabled.  The core-mm depends upon
      TRANSPARENT_HUGEPAGE to be able to allocate large folios, so if its not
      enabled, then there is no chance of meeting the physical contiguity
      requirement for contpte mappings.
      
      Link: https://lkml.kernel.org/r/20240215103205.2607016-13-ryan.roberts@arm.comSigned-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Barry Song <21cnbao@gmail.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4602e575
  10. 20 Feb, 2024 1 commit
  11. 16 Feb, 2024 5 commits
  12. 09 Feb, 2024 1 commit
  13. 08 Feb, 2024 1 commit
  14. 06 Feb, 2024 1 commit
    • Kees Cook's avatar
      ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL · 918327e9
      Kees Cook authored
      For simplicity in splitting out UBSan options into separate rules,
      remove CONFIG_UBSAN_SANITIZE_ALL, effectively defaulting to "y", which
      is how it is generally used anyway. (There are no ":= y" cases beyond
      where a specific file is enabled when a top-level ":= n" is in effect.)
      
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kbuild@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      918327e9
  15. 12 Jan, 2024 2 commits
  16. 08 Jan, 2024 1 commit
  17. 05 Jan, 2024 1 commit
  18. 11 Dec, 2023 1 commit
  19. 06 Dec, 2023 1 commit
  20. 05 Dec, 2023 1 commit
  21. 26 Oct, 2023 1 commit
    • Nathan Chancellor's avatar
      arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer · 146a15b8
      Nathan Chancellor authored
      Prior to LLVM 15.0.0, LLVM's integrated assembler would incorrectly
      byte-swap NOP when compiling for big-endian, and the resulting series of
      bytes happened to match the encoding of FNMADD S21, S30, S0, S0.
      
      This went unnoticed until commit:
      
        34f66c4c ("arm64: Use a positive cpucap for FP/SIMD")
      
      Prior to that commit, the kernel would always enable the use of FPSIMD
      early in boot when __cpu_setup() initialized CPACR_EL1, and so usage of
      FNMADD within the kernel was not detected, but could result in the
      corruption of user or kernel FPSIMD state.
      
      After that commit, the instructions happen to trap during boot prior to
      FPSIMD being detected and enabled, e.g.
      
      | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x000000001fe00000 -- ASIMD
      | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1
      | Hardware name: linux,dummy-virt (DT)
      | pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      | pc : __pi_strcmp+0x1c/0x150
      | lr : populate_properties+0xe4/0x254
      | sp : ffffd014173d3ad0
      | x29: ffffd014173d3af0 x28: fffffbfffddffcb8 x27: 0000000000000000
      | x26: 0000000000000058 x25: fffffbfffddfe054 x24: 0000000000000008
      | x23: fffffbfffddfe000 x22: fffffbfffddfe000 x21: fffffbfffddfe044
      | x20: ffffd014173d3b70 x19: 0000000000000001 x18: 0000000000000005
      | x17: 0000000000000010 x16: 0000000000000000 x15: 00000000413e7000
      | x14: 0000000000000000 x13: 0000000000001bcc x12: 0000000000000000
      | x11: 00000000d00dfeed x10: ffffd414193f2cd0 x9 : 0000000000000000
      | x8 : 0101010101010101 x7 : ffffffffffffffc0 x6 : 0000000000000000
      | x5 : 0000000000000000 x4 : 0101010101010101 x3 : 000000000000002a
      | x2 : 0000000000000001 x1 : ffffd014171f2988 x0 : fffffbfffddffcb8
      | Kernel panic - not syncing: Unhandled exception
      | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1
      | Hardware name: linux,dummy-virt (DT)
      | Call trace:
      |  dump_backtrace+0xec/0x108
      |  show_stack+0x18/0x2c
      |  dump_stack_lvl+0x50/0x68
      |  dump_stack+0x18/0x24
      |  panic+0x13c/0x340
      |  el1t_64_irq_handler+0x0/0x1c
      |  el1_abort+0x0/0x5c
      |  el1h_64_sync+0x64/0x68
      |  __pi_strcmp+0x1c/0x150
      |  unflatten_dt_nodes+0x1e8/0x2d8
      |  __unflatten_device_tree+0x5c/0x15c
      |  unflatten_device_tree+0x38/0x50
      |  setup_arch+0x164/0x1e0
      |  start_kernel+0x64/0x38c
      |  __primary_switched+0xbc/0xc4
      
      Restrict CONFIG_CPU_BIG_ENDIAN to a known good assembler, which is
      either GNU as or LLVM's IAS 15.0.0 and newer, which contains the linked
      commit.
      
      Closes: https://github.com/ClangBuiltLinux/linux/issues/1948
      Link: https://github.com/llvm/llvm-project/commit/1379b150991f70a5782e9a143c2ba5308da1161cSigned-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20231025-disable-arm64-be-ias-b4-llvm-15-v1-1-b25263ed8b23@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      146a15b8
  22. 04 Oct, 2023 1 commit
  23. 29 Sep, 2023 1 commit
    • Rob Herring's avatar
      arm64: errata: Add Cortex-A520 speculative unprivileged load workaround · 471470bc
      Rob Herring authored
      Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
      affected Cortex-A520 core, a speculatively executed unprivileged load
      might leak data from a privileged load via a cache side channel. The
      issue only exists for loads within a translation regime with the same
      translation (e.g. same ASID and VMID). Therefore, the issue only affects
      the return to EL0.
      
      The workaround is to execute a TLBI before returning to EL0 after all
      loads of privileged data. A non-shareable TLBI to any address is
      sufficient.
      
      The workaround isn't necessary if page table isolation (KPTI) is
      enabled, but for simplicity it will be. Page table isolation should
      normally be disabled for Cortex-A520 as it supports the CSV3 feature
      and the E0PD feature (used when KASLR is enabled).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20230921194156.1050055-2-robh@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      471470bc
  24. 21 Aug, 2023 1 commit
  25. 18 Aug, 2023 2 commits
    • Eric DeVolder's avatar
      arm64/kexec: refactor for kernel/Kconfig.kexec · 91506f7e
      Eric DeVolder authored
      The kexec and crash kernel options are provided in the common
      kernel/Kconfig.kexec. Utilize the common options and provide
      the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
      equivalent set of KEXEC and CRASH options.
      
      Link: https://lkml.kernel.org/r/20230712161545.87870-6-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      91506f7e
    • Barry Song's avatar
      arm64: support batched/deferred tlb shootdown during page reclamation/migration · 43b3dfdd
      Barry Song authored
      On x86, batched and deferred tlb shootdown has lead to 90% performance
      increase on tlb shootdown.  on arm64, HW can do tlb shootdown without
      software IPI.  But sync tlbi is still quite expensive.
      
      Even running a simplest program which requires swapout can
      prove this is true,
       #include <sys/types.h>
       #include <unistd.h>
       #include <sys/mman.h>
       #include <string.h>
      
       int main()
       {
       #define SIZE (1 * 1024 * 1024)
               volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                                MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
               memset(p, 0x88, SIZE);
      
               for (int k = 0; k < 10000; k++) {
                       /* swap in */
                       for (int i = 0; i < SIZE; i += 4096) {
                               (void)p[i];
                       }
      
                       /* swap out */
                       madvise(p, SIZE, MADV_PAGEOUT);
               }
       }
      
      Perf result on snapdragon 888 with 8 cores by using zRAM
      as the swap block device.
      
       ~ # perf record taskset -c 4 ./a.out
       [ perf record: Woken up 10 times to write data ]
       [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ]
       ~ # perf report
       # To display the perf.data header info, please use --header/--header-only options.
       # To display the perf.data header info, please use --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 60K of event 'cycles'
       # Event count (approx.): 35706225414
       #
       # Overhead  Command  Shared Object      Symbol
       # ........  .......  .................  ......
       #
          21.07%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irq
           8.23%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           6.67%  a.out    [kernel.kallsyms]  [k] filemap_map_pages
           6.16%  a.out    [kernel.kallsyms]  [k] __zram_bvec_write
           5.36%  a.out    [kernel.kallsyms]  [k] ptep_clear_flush
           3.71%  a.out    [kernel.kallsyms]  [k] _raw_spin_lock
           3.49%  a.out    [kernel.kallsyms]  [k] memset64
           1.63%  a.out    [kernel.kallsyms]  [k] clear_page
           1.42%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock
           1.26%  a.out    [kernel.kallsyms]  [k] mod_zone_state.llvm.8525150236079521930
           1.23%  a.out    [kernel.kallsyms]  [k] xas_load
           1.15%  a.out    [kernel.kallsyms]  [k] zram_slot_lock
      
      ptep_clear_flush() takes 5.36% CPU in the micro-benchmark swapping in/out
      a page mapped by only one process.  If the page is mapped by multiple
      processes, typically, like more than 100 on a phone, the overhead would be
      much higher as we have to run tlb flush 100 times for one single page. 
      Plus, tlb flush overhead will increase with the number of CPU cores due to
      the bad scalability of tlb shootdown in HW, so those ARM64 servers should
      expect much higher overhead.
      
      Further perf annonate shows 95% cpu time of ptep_clear_flush is actually
      used by the final dsb() to wait for the completion of tlb flush.  This
      provides us a very good chance to leverage the existing batched tlb in
      kernel.  The minimum modification is that we only send async tlbi in the
      first stage and we send dsb while we have to sync in the second stage.
      
      With the above simplest micro benchmark, collapsed time to finish the
      program decreases around 5%.
      
      Typical collapsed time w/o patch:
       ~ # time taskset -c 4 ./a.out
       0.21user 14.34system 0:14.69elapsed
      w/ patch:
       ~ # time taskset -c 4 ./a.out
       0.22user 13.45system 0:13.80elapsed
      
      Also tested with benchmark in the commit on Kunpeng920 arm64 server
      and observed an improvement around 12.5% with command
      `time ./swap_bench`.
              w/o             w/
      real    0m13.460s       0m11.771s
      user    0m0.248s        0m0.279s
      sys     0m12.039s       0m11.458s
      
      Originally it's noticed a 16.99% overhead of ptep_clear_flush()
      which has been eliminated by this patch:
      
      [root@localhost yang]# perf record -- ./swap_bench && perf report
      [...]
      16.99%  swap_bench  [kernel.kallsyms]  [k] ptep_clear_flush
      
      It is tested on 4,8,128 CPU platforms and shows to be beneficial on
      large systems but may not have improvement on small systems like on
      a 4 CPU platform.
      
      Also this patch improve the performance of page migration. Using pmbench
      and tries to migrate the pages of pmbench between node 0 and node 1 for
      100 times for 1G memory, this patch decrease the time used around 20%
      (prev 18.338318910 sec after 13.981866350 sec) and saved the time used
      by ptep_clear_flush().
      
      Link: https://lkml.kernel.org/r/20230717131004.12662-5-yangyicong@huawei.comTested-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Tested-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Tested-by: default avatarPunit Agrawal <punit.agrawal@bytedance.com>
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: lipeifeng <lipeifeng@oppo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zeng Tao <prime.zeng@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      43b3dfdd
  26. 27 Jul, 2023 1 commit
  27. 10 Jul, 2023 1 commit
  28. 24 Jun, 2023 1 commit
  29. 21 Jun, 2023 1 commit
  30. 20 Jun, 2023 1 commit
  31. 19 Jun, 2023 1 commit
    • Catalin Marinas's avatar
      arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 · 1c1a429e
      Catalin Marinas authored
      With the DMA bouncing of unaligned kmalloc() buffers now in place, enable
      it for arm64 to allow the kmalloc-{8,16,32,48,96} caches.  In addition,
      always create the swiotlb buffer even when the end of RAM is within the
      32-bit physical address range (the swiotlb buffer can still be disabled on
      the kernel command line).
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-18-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1c1a429e
  32. 16 Jun, 2023 1 commit