1. 25 Jul, 2022 10 commits
  2. 20 Jul, 2022 1 commit
    • Barry Song's avatar
      arm64: enable THP_SWAP for arm64 · d0637c50
      Barry Song authored
      THP_SWAP has been proven to improve the swap throughput significantly
      on x86_64 according to commit bd4c82c2 ("mm, THP, swap: delay
      splitting THP after swapped out").
      As long as arm64 uses 4K page size, it is quite similar with x86_64
      by having 2MB PMD THP. THP_SWAP is architecture-independent, thus,
      enabling it on arm64 will benefit arm64 as well.
      A corner case is that MTE has an assumption that only base pages
      can be swapped. We won't enable THP_SWAP for ARM64 hardware with
      MTE support until MTE is reworked to coexist with THP_SWAP.
      
      A micro-benchmark is written to measure thp swapout throughput as
      below,
      
       unsigned long long tv_to_ms(struct timeval tv)
       {
       	return tv.tv_sec * 1000 + tv.tv_usec / 1000;
       }
      
       main()
       {
       	struct timeval tv_b, tv_e;;
       #define SIZE 400*1024*1024
       	volatile void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
       				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
       	if (!p) {
       		perror("fail to get memory");
       		exit(-1);
       	}
      
       	madvise(p, SIZE, MADV_HUGEPAGE);
       	memset(p, 0x11, SIZE); /* write to get mem */
      
       	gettimeofday(&tv_b, NULL);
       	madvise(p, SIZE, MADV_PAGEOUT);
       	gettimeofday(&tv_e, NULL);
      
       	printf("swp out bandwidth: %ld bytes/ms\n",
       			SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b)));
       }
      
      Testing is done on rk3568 64bit Quad Core Cortex-A55 platform -
      ROCK 3A.
      thp swp throughput w/o patch: 2734bytes/ms (mean of 10 tests)
      thp swp throughput w/  patch: 3331bytes/ms (mean of 10 tests)
      
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Link: https://lore.kernel.org/r/20220720093737.133375-1-21cnbao@gmail.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      d0637c50
  3. 19 Jul, 2022 3 commits
  4. 05 Jul, 2022 3 commits
  5. 04 Jul, 2022 1 commit
  6. 01 Jul, 2022 1 commit
  7. 28 Jun, 2022 7 commits
  8. 27 Jun, 2022 6 commits
  9. 24 Jun, 2022 1 commit
  10. 23 Jun, 2022 7 commits
    • Andre Mueller's avatar
      Documentation/arm64: update memory layout table. · 5bed6a93
      Andre Mueller authored
      Commit b89ddf4c("arm64/bpf: Remove 128MB limit for BPF JIT programs")
      removes the bpf jit region from the memory layout of the Aarch64
      architecture. However, it forgets to update the documentation
      accordingly.
      
      - Remove the bpf jit region.
      - Fix the Start and End addresses of the modules region.
      - Fix the Start address of the vmalloc region.
      Signed-off-by: default avatarAndre Mueller <am@emlix.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Link: https://lore.kernel.org/r/20220621081651.61755-1-am@emlix.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      5bed6a93
    • Kefeng Wang's avatar
      arm64: kcsan: Support detecting more missing memory barriers · 4d09caec
      Kefeng Wang authored
      As "kcsan: Support detecting a subset of missing memory barriers"[1]
      introduced KCSAN_STRICT/KCSAN_WEAK_MEMORY which make kcsan detects
      more missing memory barrier, but arm64 don't have KCSAN instrumentation
      for barriers, so the new selftest test_barrier() and test cases for
      memory barrier instrumentation in kcsan_test module will fail, even
      panic on selftest.
      
      Let's prefix all barriers with __ on arm64, as asm-generic/barriers.h
      defined the final instrumented version of these barriers, which will
      fix the above issues.
      
      Note, barrier instrumentation that can be disabled via __no_kcsan with
      appropriate compiler-support (and not just with objtool help), see
      commit bd3d5bd1 ("kcsan: Support WEAK_MEMORY with Clang where no
      objtool support exists"), it adds disable_sanitizer_instrumentation to
      __no_kcsan attribute which will remove all sanitizer instrumentation fully
      (with Clang 14.0). Meanwhile, GCC does the same thing with no_sanitize.
      
      [1] https://lore.kernel.org/linux-mm/20211130114433.2580590-1-elver@google.com/Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Link: https://lore.kernel.org/r/20220523113126.171714-3-wangkefeng.wang@huawei.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      4d09caec
    • Kefeng Wang's avatar
      asm-generic: Add memory barrier dma_mb() · ed59dfd9
      Kefeng Wang authored
      The memory barrier dma_mb() is introduced by commit a76a3777
      ("iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer"),
      which is used to ensure that prior (both reads and writes) accesses
      to memory by a CPU are ordered w.r.t. a subsequent MMIO write.
      
      Reviewed-by: Arnd Bergmann <arnd@arndb.de> # for asm-generic
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Link: https://lore.kernel.org/r/20220523113126.171714-2-wangkefeng.wang@huawei.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      ed59dfd9
    • Jisheng Zhang's avatar
      arm64: boot: add zstd support · 9f6a503d
      Jisheng Zhang authored
      Support build the zstd compressed Image.zst. Similar as other
      compressed formats, the Image.zst is not self-decompressing and
      the bootloader still needs to handle decompression before
      launching the kernel image.
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Link: https://lore.kernel.org/r/20220619170657.2657-1-jszhang@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      9f6a503d
    • Ard Biesheuvel's avatar
      arm64: mm: install KPTI nG mappings with MMU enabled · 47546a19
      Ard Biesheuvel authored
      In cases where we unmap the kernel while running in user space, we rely
      on ASIDs to distinguish the minimal trampoline from the full kernel
      mapping, and this means we must use non-global attributes for those
      mappings, to ensure they are scoped by ASID and will not hit in the TLB
      inadvertently.
      
      We only do this when needed, as this is generally more costly in terms
      of TLB pressure, and so we boot without these non-global attributes, and
      apply them to all existing kernel mappings once all CPUs are up and we
      know whether or not the non-global attributes are needed. At this point,
      we cannot simply unmap and remap the entire address space, so we have to
      update all existing block and page descriptors in place.
      
      Currently, we go through a lot of trouble to perform these updates with
      the MMU and caches off, to avoid violating break before make (BBM) rules
      imposed by the architecture. Since we make changes to page tables that
      are not covered by the ID map, we gain access to those descriptors by
      disabling translations altogether. This means that the stores to memory
      are issued with device attributes, and require extra care in terms of
      coherency, which is costly. We also rely on the ID map to access a
      shared flag, which requires the ID map to be executable and writable at
      the same time, which is another thing we'd prefer to avoid.
      
      So let's switch to an approach where we replace the kernel mapping with
      a minimal mapping of a few pages that can be used for a minimal, ad-hoc
      fixmap that we can use to map each page table in turn as we traverse the
      hierarchy.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20220609174320.4035379-3-ardb@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      47546a19
    • Ard Biesheuvel's avatar
      arm64: kpti-ng: simplify page table traversal logic · c7eff738
      Ard Biesheuvel authored
      Simplify the KPTI G-to-nG asm helper code by:
      - pulling the 'table bit' test into the get/put macros so we can combine
        them and incorporate the entire loop;
      - moving the 'table bit' test after the update of bit #11 so we no
        longer need separate next_xxx and skip_xxx labels;
      - redefining the pmd/pud register aliases and the next_pmd/next_pud
        labels instead of branching to them if the number of configured page
        table levels is less than 3 or 4, respectively.
      
      No functional change intended, except for the fact that we now descend
      into a next level table after setting bit #11 on its descriptor but this
      should make no difference in practice.
      
      While at it, switch to .L prefixed local labels so they don't clutter up
      the symbol tables, kallsyms, etc, and clean up the indentation for
      legibility.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20220609174320.4035379-2-ardb@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      c7eff738
    • Mark Rutland's avatar
      arm64: select TRACE_IRQFLAGS_NMI_SUPPORT · 3381da25
      Mark Rutland authored
      Due to an oversight, on arm64 lockdep IRQ state tracking doesn't work as
      intended in NMI context. This demonstrably results in bogus warnings
      from lockdep, and in theory could mask a variety of issues.
      
      On arm64, we've consistently tracked IRQ flag state for NMIs (and
      saved/restored the state of the interrupted context) since commit:
      
        f0cd5ac1 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
      
      That commit fixed most lockdep issues with NMI by virtue of the
      save/restore of the lockdep state of the interrupted context. However,
      for lockdep IRQ state tracking to consistently take effect in NMI
      context it has been necessary to select TRACE_IRQFLAGS_NMI_SUPPORT since
      commit:
      
        ed004953 ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs")
      
      As arm64 does not select TRACE_IRQFLAGS_NMI_SUPPORT, this means that the
      lockdep state can be stale in NMI context, and some uses of that state
      can consume stale data.
      
      When an NMI is taken arm64 entry code will call arm64_enter_nmi(). This
      will enter NMI context via __nmi_enter() before calling
      lockdep_hardirqs_off() to inform lockdep that IRQs have been masked.
      Where TRACE_IRQFLAGS_NMI_SUPPORT is not selected, lockdep_hardirqs_off()
      will not update lockdep state if called in NMI context. Thus if IRQs
      were enabled in the original context, lockdep will continue to believe
      that IRQs are enabled despite the call to lockdep_hardirqs_off().
      
      However, the lockdep_assert_*() checks do take effect in NMI context,
      and will consume the stale lockdep state. If an NMI is taken from a
      context which had IRQs enabled, and during the handling of the NMI
      something calls lockdep_assert_irqs_disabled(), this will result in a
      spurious warning based upon the stale lockdep state.
      
      This can be seen when using perf with GICv3 pseudo-NMIs. Within the perf
      NMI handler we may attempt a uaccess to record the userspace callchain,
      and is this faults the el1_abort() call in the nested context will call
      exit_to_kernel_mode() when returning, which has a
      lockdep_assert_irqs_disabled() assertion:
      
      | # ./perf record -a -g sh
      | ------------[ cut here ]------------
      | WARNING: CPU: 0 PID: 164 at arch/arm64/kernel/entry-common.c:73 exit_to_kernel_mode+0x118/0x1ac
      | Modules linked in:
      | CPU: 0 PID: 164 Comm: perf Not tainted 5.18.0-rc5 #1
      | Hardware name: linux,dummy-virt (DT)
      | pstate: 004003c5 (nzcv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      | pc : exit_to_kernel_mode+0x118/0x1ac
      | lr : el1_abort+0x80/0xbc
      | sp : ffff8000080039f0
      | pmr_save: 000000f0
      | x29: ffff8000080039f0 x28: ffff6831054e4980 x27: ffff683103adb400
      | x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001
      | x23: 00000000804000c5 x22: 00000000000000c0 x21: 0000000000000001
      | x20: ffffbd51e635ec44 x19: ffff800008003a60 x18: 0000000000000000
      | x17: ffffaadf98d23000 x16: ffff800008004000 x15: 0000ffffd14f25c0
      | x14: 0000000000000000 x13: 00000000000018eb x12: 0000000000000040
      | x11: 000000000000001e x10: 000000002b820020 x9 : 0000000100110000
      | x8 : 000000000045cac0 x7 : 0000ffffd14f25c0 x6 : ffffbd51e639b000
      | x5 : 00000000000003e5 x4 : ffffbd51e58543b0 x3 : 0000000000000001
      | x2 : ffffaadf98d23000 x1 : ffff6831054e4980 x0 : 0000000100110000
      | Call trace:
      |  exit_to_kernel_mode+0x118/0x1ac
      |  el1_abort+0x80/0xbc
      |  el1h_64_sync_handler+0xa4/0xd0
      |  el1h_64_sync+0x74/0x78
      |  __arch_copy_from_user+0xa4/0x230
      |  get_perf_callchain+0x134/0x1e4
      |  perf_callchain+0x7c/0xa0
      |  perf_prepare_sample+0x414/0x660
      |  perf_event_output_forward+0x80/0x180
      |  __perf_event_overflow+0x70/0x13c
      |  perf_event_overflow+0x1c/0x30
      |  armv8pmu_handle_irq+0xe8/0x160
      |  armpmu_dispatch_irq+0x2c/0x70
      |  handle_percpu_devid_fasteoi_nmi+0x7c/0xbc
      |  generic_handle_domain_nmi+0x3c/0x60
      |  gic_handle_irq+0x1dc/0x310
      |  call_on_irq_stack+0x2c/0x54
      |  do_interrupt_handler+0x80/0x94
      |  el1_interrupt+0xb0/0xe4
      |  el1h_64_irq_handler+0x18/0x24
      |  el1h_64_irq+0x74/0x78
      |  lockdep_hardirqs_off+0x50/0x120
      |  trace_hardirqs_off+0x38/0x214
      |  _raw_spin_lock_irq+0x98/0xa0
      |  pipe_read+0x1f8/0x404
      |  new_sync_read+0x140/0x150
      |  vfs_read+0x190/0x1dc
      |  ksys_read+0xdc/0xfc
      |  __arm64_sys_read+0x20/0x30
      |  invoke_syscall+0x48/0x114
      |  el0_svc_common.constprop.0+0x158/0x17c
      |  do_el0_svc+0x28/0x90
      |  el0_svc+0x60/0x150
      |  el0t_64_sync_handler+0xa4/0x130
      |  el0t_64_sync+0x19c/0x1a0
      | irq event stamp: 483
      | hardirqs last  enabled at (483): [<ffffbd51e636aa24>] _raw_spin_unlock_irqrestore+0xa4/0xb0
      | hardirqs last disabled at (482): [<ffffbd51e636acd0>] _raw_spin_lock_irqsave+0xb0/0xb4
      | softirqs last  enabled at (468): [<ffffbd51e5216f58>] put_cpu_fpsimd_context+0x28/0x70
      | softirqs last disabled at (466): [<ffffbd51e5216ed4>] get_cpu_fpsimd_context+0x0/0x5c
      | ---[ end trace 0000000000000000 ]---
      
      Note that as lockdep_assert_irqs_disabled() uses WARN_ON_ONCE(), and
      this uses a BRK, the warning is logged with the real PSTATE at the time
      of the warning, which clearly has DAIF.I set, meaning IRQs (and
      pseudo-NMIs) were definitely masked and the warning is spurious.
      
      Fix this by selecting TRACE_IRQFLAGS_NMI_SUPPORT such that the existing
      entry tracking takes effect, as we had originally intended when the
      arm64 entry code was fixed for transitions to/from NMI.
      
      Arguably the lockdep_assert_*() functions should have the same NMI
      checks as the rest of the code to prevent spurious warnings when
      TRACE_IRQFLAGS_NMI_SUPPORT is not selected, but the real fix for any
      architecture is to explicitly handle the transitions to/from NMI in the
      entry code.
      
      Fixes: f0cd5ac1 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20220511131733.4074499-3-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      3381da25