An error occurred fetching the project authors.
  1. 25 Jul, 2022 1 commit
  2. 21 Jul, 2022 1 commit
    • Peter Zijlstra's avatar
      mmu_gather: Remove per arch tlb_{start,end}_vma() · 1e9fdf21
      Peter Zijlstra authored
      Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
      Provide two new MMU_GATHER_knobs to enumerate them and remove the per
      arch tlb_{start,end}_vma() implementations.
      
       - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
         but does *NOT* want to call it for each VMA.
      
       - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
         invalidate across multiple VMAs if possible.
      
      With these it is possible to capture the three forms:
      
        1) empty stubs;
           select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS
      
        2) start: flush_cache_range(), end: empty;
           select MMU_GATHER_MERGE_VMAS
      
        3) start: flush_cache_range(), end: flush_tlb_range();
           default
      
      Obviously, if the architecture does not have flush_cache_range() then
      it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e9fdf21
  3. 18 Jul, 2022 1 commit
    • Jason A. Donenfeld's avatar
      random: remove CONFIG_ARCH_RANDOM · 9592eef7
      Jason A. Donenfeld authored
      When RDRAND was introduced, there was much discussion on whether it
      should be trusted and how the kernel should handle that. Initially, two
      mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
      "nordrand", a boot-time switch.
      
      Later the thinking evolved. With a properly designed RNG, using RDRAND
      values alone won't harm anything, even if the outputs are malicious.
      Rather, the issue is whether those values are being *trusted* to be good
      or not. And so a new set of options were introduced as the real
      ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
      With these options, RDRAND is used, but it's not always credited. So in
      the worst case, it does nothing, and in the best case, maybe it helps.
      
      Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
      center and became something certain platforms force-select.
      
      The old options don't really help with much, and it's a bit odd to have
      special handling for these instructions when the kernel can deal fine
      with the existence or untrusted existence or broken existence or
      non-existence of that CPU capability.
      
      Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
      ordinary asm-generic fallback pattern instead, keeping the two options
      that are actually used. For now it leaves "nordrand" for now, as the
      removal of that will take a different route.
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      9592eef7
  4. 30 Jun, 2022 1 commit
    • Frederic Weisbecker's avatar
      context_tracking: Split user tracking Kconfig · 24a9c541
      Frederic Weisbecker authored
      Context tracking is going to be used not only to track user transitions
      but also idle/IRQs/NMIs. The user tracking part will then become a
      separate feature. Prepare Kconfig for that.
      
      [ frederic: Apply Max Filippov feedback. ]
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
      Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
      Cc: Yu Liao <liaoyu15@huawei.com>
      Cc: Phil Auld <pauld@redhat.com>
      Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
      Cc: Alex Belits <abelits@marvell.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      24a9c541
  5. 29 Jun, 2022 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/memhotplug: Add add_pages override for PPC · ac790d09
      Aneesh Kumar K.V authored
      With commit ffa0b64e ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit")
      the kernel now validate the addr against high_memory value. This results
      in the below BUG_ON with dax pfns.
      
      [  635.798741][T26531] kernel BUG at mm/page_alloc.c:5521!
      1:mon> e
      cpu 0x1: Vector: 700 (Program Check) at [c000000007287630]
          pc: c00000000055ed48: free_pages.part.0+0x48/0x110
          lr: c00000000053ca70: tlb_finish_mmu+0x80/0xd0
          sp: c0000000072878d0
         msr: 800000000282b033
        current = 0xc00000000afabe00
        paca    = 0xc00000037ffff300   irqmask: 0x03   irq_happened: 0x05
          pid   = 26531, comm = 50-landscape-sy
      kernel BUG at :5521!
      Linux version 5.19.0-rc3-14659-g4ec05be7c2e1 (kvaneesh@ltc-boston8) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #625 SMP Thu Jun 23 00:35:43 CDT 2022
      1:mon> t
      [link register   ] c00000000053ca70 tlb_finish_mmu+0x80/0xd0
      [c0000000072878d0] c00000000053ca54 tlb_finish_mmu+0x64/0xd0 (unreliable)
      [c000000007287900] c000000000539424 exit_mmap+0xe4/0x2a0
      [c0000000072879e0] c00000000019fc1c mmput+0xcc/0x210
      [c000000007287a20] c000000000629230 begin_new_exec+0x5e0/0xf40
      [c000000007287ae0] c00000000070b3cc load_elf_binary+0x3ac/0x1e00
      [c000000007287c10] c000000000627af0 bprm_execve+0x3b0/0xaf0
      [c000000007287cd0] c000000000628414 do_execveat_common.isra.0+0x1e4/0x310
      [c000000007287d80] c00000000062858c sys_execve+0x4c/0x60
      [c000000007287db0] c00000000002c1b0 system_call_exception+0x160/0x2c0
      [c000000007287e10] c00000000000c53c system_call_common+0xec/0x250
      
      The fix is to make sure we update high_memory on memory hotplug.
      This is similar to what x86 does in commit 3072e413 ("mm/memory_hotplug: introduce add_pages")
      
      Fixes: ffa0b64e ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220629050925.31447-1-aneesh.kumar@linux.ibm.com
      ac790d09
  6. 28 Jun, 2022 1 commit
    • Arnd Bergmann's avatar
      arch/*/: remove CONFIG_VIRT_TO_BUS · 4313a249
      Arnd Bergmann authored
      All architecture-independent users of virt_to_bus() and bus_to_virt()
      have been fixed to use the dma mapping interfaces or have been
      removed now.  This means the definitions on most architectures, and the
      CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.
      
      The only exceptions to this are a few network and scsi drivers for m68k
      Amiga and VME machines and ppc32 Macintosh. These drivers work correctly
      with the old interfaces and are probably not worth changing.
      
      On alpha and parisc, virt_to_bus() were still used in asm/floppy.h.
      alpha can use isa_virt_to_bus() like x86 does, and parisc can just
      open-code the virt_to_phys() here, as this is architecture specific
      code.
      
      I tried updating the bus-virt-phys-mapping.rst documentation, which
      started as an email from Linus to explain some details of the Linux-2.0
      driver interfaces. The bits about virt_to_bus() were declared obsolete
      backin 2000, and the rest is not all that relevant any more, so in the
      end I just decided to remove the file completely.
      Reviewed-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      4313a249
  7. 02 Jun, 2022 1 commit
  8. 29 May, 2022 1 commit
    • Michael Ellerman's avatar
      powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK · 1346d00e
      Michael Ellerman authored
      The HAVE_IRQ_EXIT_ON_IRQ_STACK option tells generic code that irq_exit()
      is called while still running on the hard irq stack (hardirq_ctx[] in
      the powerpc code).
      
      Selecting the option means the generic code will *not* switch to the
      softirq stack before running softirqs, because the code is already
      running on the (mostly empty) hard irq stack.
      
      But since commit 1b1b6a6f ("powerpc: handle irq_enter/irq_exit in
      interrupt handler wrappers"), irq_exit() is now called on the regular task
      stack, not the hard irq stack.
      
      That's because previously irq_exit() was called in __do_irq() which is
      run on the hard irq stack, but now it is called in
      interrupt_async_exit_prepare() which is called from do_irq() constructed
      by the wrapper macro, which is after the switch back to the task stack.
      
      So drop HAVE_IRQ_EXIT_ON_IRQ_STACK from the Kconfig. This will mean an
      extra stack switch when processing some interrupts, but should
      significantly reduce the likelihood of stack overflow.
      
      It also means the softirq stack will be used for running softirqs from
      other interrupts that don't use the hard irq stack, eg. timer interrupts.
      
      Fixes: 1b1b6a6f ("powerpc: handle irq_enter/irq_exit in interrupt handler wrappers")
      Cc: stable@vger.kernel.org # v5.12+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220525032639.1947280-1-mpe@ellerman.id.au
      1346d00e
  9. 24 May, 2022 1 commit
    • Masahiro Yamada's avatar
      kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS · 7b453719
      Masahiro Yamada authored
      include/{linux,asm-generic}/export.h defines a weak symbol, __crc_*
      as a placeholder.
      
      Genksyms writes the version CRCs into the linker script, which will be
      used for filling the __crc_* symbols. The linker script format depends
      on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset
      to the reference of CRC.
      
      It is time to get rid of this complexity.
      
      Now that modpost parses text files (.*.cmd) to collect all the CRCs,
      it can generate C code that will be linked to the vmlinux or modules.
      
      Generate a new C file, .vmlinux.export.c, which contains the CRCs of
      symbols exported by vmlinux. It is compiled and linked to vmlinux in
      scripts/link-vmlinux.sh.
      
      Put the CRCs of symbols exported by modules into the existing *.mod.c
      files. No additional build step is needed for modules. As before,
      *.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal.
      
      No linker magic is used here. The new C implementation works in the
      same way, whether CONFIG_RELOCATABLE is enabled or not.
      CONFIG_MODULE_REL_CRCS is no longer needed.
      
      Previously, Kbuild invoked additional $(LD) to update the CRCs in
      objects, but this step is unneeded too.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNicolas Schier <nicolas@fjasle.eu>
      Reviewed-by: default avatarNicolas Schier <nicolas@fjasle.eu>
      Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
      7b453719
  10. 22 May, 2022 2 commits
    • Daniel Axtens's avatar
      powerpc: Book3S 64-bit outline-only KASAN support · 41b7a347
      Daniel Axtens authored
      Implement a limited form of KASAN for Book3S 64-bit machines running under
      the Radix MMU, supporting only outline mode.
      
       - Enable the compiler instrumentation to check addresses and maintain the
         shadow region. (This is the guts of KASAN which we can easily reuse.)
      
       - Require kasan-vmalloc support to handle modules and anything else in
         vmalloc space.
      
       - KASAN needs to be able to validate all pointer accesses, but we can't
         instrument all kernel addresses - only linear map and vmalloc. On boot,
         set up a single page of read-only shadow that marks all iomap and
         vmemmap accesses as valid.
      
       - Document KASAN in powerpc docs.
      
      Background
      ----------
      
      KASAN support on Book3S is a bit tricky to get right:
      
       - It would be good to support inline instrumentation so as to be able to
         catch stack issues that cannot be caught with outline mode.
      
       - Inline instrumentation requires a fixed offset.
      
       - Book3S runs code with translations off ("real mode") during boot,
         including a lot of generic device-tree parsing code which is used to
         determine MMU features.
      
          [ppc64 mm note: The kernel installs a linear mapping at effective
          address c000...-c008.... This is a one-to-one mapping with physical
          memory from 0000... onward. Because of how memory accesses work on
          powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
          same memory both with translations on (accessing as an 'effective
          address'), and with translations off (accessing as a 'real
          address'). This works in both guests and the hypervisor. For more
          details, see s5.7 of Book III of version 3 of the ISA, in particular
          the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
          KASAN implementation currently only supports Radix.]
      
       - Some code - most notably a lot of KVM code - also runs with translations
         off after boot.
      
       - Therefore any offset has to point to memory that is valid with
         translations on or off.
      
      One approach is just to give up on inline instrumentation. This way
      boot-time checks can be delayed until after the MMU is set is up, and we
      can just not instrument any code that runs with translations off after
      booting. Take this approach for now and require outline instrumentation.
      
      Previous attempts allowed inline instrumentation. However, they came with
      some unfortunate restrictions: only physically contiguous memory could be
      used and it had to be specified at compile time. Maybe we can do better in
      the future.
      
      [paulus@ozlabs.org - Rebased onto 5.17.  Note that a kernel with
       CONFIG_KASAN=y will crash during boot on a machine using HPT
       translation because not all the entry points to the generic
       KASAN code are protected with a call to kasan_arch_is_ready().]
      
      Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      [mpe: Update copyright year and comment formatting]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
      41b7a347
    • Michael Ellerman's avatar
      powerpc: Add generic PAGE_SIZE config symbols · d036dc79
      Michael Ellerman authored
      Other arches (sh, mips, hexagon) use standard names for PAGE_SIZE
      related config symbols.
      
      Add matching symbols for powerpc, which are enabled by default but
      depend on our architecture specific PAGE_SIZE symbols.
      
      This allows generic/driver code to express dependencies on the PAGE_SIZE
      without needing to refer to architecture specific config symbols.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220505125123.2088143-1-mpe@ellerman.id.au
      d036dc79
  11. 19 May, 2022 1 commit
  12. 05 May, 2022 1 commit
  13. 29 Apr, 2022 1 commit
  14. 26 Apr, 2022 1 commit
  15. 05 Apr, 2022 1 commit
    • Christophe Leroy's avatar
      powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx · eeaec780
      Christophe Leroy authored
      book3s/32 and 8xx have a separate area for allocating modules,
      defined by MODULES_VADDR / MODULES_END.
      
      On book3s/32, it is not possible to protect against execution
      on a page basis. A full 256M segment is either Exec or NoExec.
      The module area is in an Exec segment while vmalloc area is
      in a NoExec segment.
      
      In order to protect module data against execution, select
      ARCH_WANTS_MODULES_DATA_IN_VMALLOC.
      
      For the 8xx (and possibly other 32 bits platform in the future),
      there is no such constraint on Exec/NoExec protection, however
      there is a critical distance between kernel functions and callers
      that needs to remain below 32Mbytes in order to avoid costly
      trampolines. By allocating data outside of module area, we
      increase the chance for module text to remain within acceptable
      distance from kernel core text.
      
      So select ARCH_WANTS_MODULES_DATA_IN_VMALLOC for 8xx as well.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      eeaec780
  16. 22 Mar, 2022 1 commit
  17. 18 Mar, 2022 1 commit
  18. 26 Feb, 2022 1 commit
    • Kees Cook's avatar
      usercopy: Check valid lifetime via stack depth · 2792d84e
      Kees Cook authored
      One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether
      an object that is about to be copied to/from userspace is overlapping
      the stack at all. If it is, it performs a number of inexpensive
      bounds checks. One of the finer-grained checks is whether an object
      crosses stack frames within the stack region. Doing this on x86 with
      CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too
      heavy, and was left out (a while ago), leaving the courser whole-stack
      check.
      
      The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM
      try to exercise these cross-frame cases to validate the defense is
      working. They have been failing ever since ORC was added (which was
      expected). While Muhammad was investigating various LKDTM failures[1],
      he asked me for additional details on them, and I realized that when
      exact stack frame boundary checking is not available (i.e. everything
      except x86 with FRAME_POINTER), it could check if a stack object is at
      least "current depth valid", in the sense that any object within the
      stack region but not between start-of-stack and current_stack_pointer
      should be considered unavailable (i.e. its lifetime is from a call no
      longer present on the stack).
      
      Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
      have actually implemented the common global register alias.
      
      Additionally report usercopy bounds checking failures with an offset
      from current_stack_pointer, which may assist with diagnosing failures.
      
      The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
      (once slightly adjusted in a separate patch) pass again with this fixed.
      
      [1] https://github.com/kernelci/kernelci-project/issues/84
      
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      Reported-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      ---
      v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org
      v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org
      v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org
      v4: - improve commit log (akpm)
      2792d84e
  19. 16 Feb, 2022 1 commit
  20. 12 Feb, 2022 1 commit
  21. 07 Feb, 2022 2 commits
  22. 20 Jan, 2022 1 commit
    • Kefeng Wang's avatar
      mm: percpu: generalize percpu related config · 7ecd19cf
      Kefeng Wang authored
      Patch series "mm: percpu: Cleanup percpu first chunk function".
      
      When supporting page mapping percpu first chunk allocator on arm64, we
      found there are lots of duplicated codes in percpu embed/page first chunk
      allocator.  This patchset is aimed to cleanup them and should no function
      change.
      
      The currently supported status about 'embed' and 'page' in Archs shows
      below,
      
      	embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
      	page:  NEED_PER_CPU_EMBED_FIRST_CHUNK
      
      		embed	page
      	------------------------
      	arm64	  Y	 Y
      	mips	  Y	 N
      	powerpc	  Y	 Y
      	riscv	  Y	 N
      	sparc	  Y	 Y
      	x86	  Y	 Y
      	------------------------
      
      There are two interfaces about percpu first chunk allocator,
      
       extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
                                      size_t atom_size,
                                      pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
       extern int __init pcpu_page_first_chunk(size_t reserved_size,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn,
      -                               pcpu_fc_populate_pte_fn_t populate_pte_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
      The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
      pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
      pcpu_embed/page_first_chunk().
      
      1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
         provided when archs supported NUMA.
      
      2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
         a generic pcpu_populate_pte() which marked '__weak' is provided, if you
         need a different function to populate pte on the arch(like x86), please
         provide its own implementation.
      
      [1] https://github.com/kevin78/linux.git percpu-cleanup
      
      This patch (of 4):
      
      The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
      NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
      duplicate definitions on platforms that subscribe it.
      
      Move them into mm, drop these redundant definitions and instead just
      select it on applicable platforms.
      
      Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ecd19cf
  23. 09 Dec, 2021 2 commits
  24. 02 Dec, 2021 1 commit
  25. 29 Nov, 2021 2 commits
  26. 27 Oct, 2021 2 commits
  27. 22 Oct, 2021 5 commits
    • Christophe Leroy's avatar
      powerpc: Activate CONFIG_STRICT_KERNEL_RWX by default · fdacae8a
      Christophe Leroy authored
      CONFIG_STRICT_KERNEL_RWX should be set by default on every
      architectures (See https://github.com/KSPP/linux/issues/4)
      
      On PPC32 we have to find a compromise between performance and/or
      memory wasting and selection of strict_kernel_rwx, because it implies
      either smaller memory chunks or larger alignment between RO memory
      and RW memory.
      
      For instance the 8xx maps memory with 8M pages. So either the limit
      between RO and RW must be 8M aligned or it falls back or 512k pages
      which implies more pressure on the TLB.
      
      book3s/32 maps memory with BATs as much as possible. BATS can have
      any power-of-two size between 128k and 256M but we have only 4 to 8
      BATs so the alignment must be good enough to allow efficient use of
      the BATs and avoid falling back on standard page mapping which would
      kill performance.
      
      So let's go one step forward and make it the default but still allow
      users to unset it when wanted.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/057c40164084bfc7d77c0b2ff78d95dbf6a2a21b.1632503622.git.christophe.leroy@csgroup.eu
      fdacae8a
    • Christophe Leroy's avatar
      powerpc/32: Add support for out-of-line static calls · 5c810ced
      Christophe Leroy authored
      Add support for out-of-line static calls on PPC32. This change
      improve performance of calls to global function pointers by
      using direct calls instead of indirect calls.
      
      The trampoline is initialy populated with a 'blr' or branch to target,
      followed by an unreachable long jump sequence.
      
      In order to cater with parallele execution, the trampoline needs to
      be updated in a way that ensures it remains consistent at all time.
      This means we can't use the traditional lis/addi to load r12 with
      the target address, otherwise there would be a window during which
      the first instruction contains the upper part of the new target
      address while the second instruction still contains the lower part of
      the old target address. To avoid that the target address is stored
      just after the 'bctr' and loaded from there with a single instruction.
      
      Then, depending on the target distance, arch_static_call_transform()
      will either replace the first instruction by a direct 'bl <target>' or
      'nop' in order to have the trampoline fall through the long jump
      sequence.
      
      For the special case of __static_call_return0(), to avoid the risk of
      a far branch, a version of it is inlined at the end of the trampoline.
      
      Performancewise the long jump sequence is probably not better than
      the indirect calls set by GCC when we don't use static calls, but
      such calls are unlikely to be required on powerpc32: With most
      configurations the kernel size is far below 32 Mbytes so only
      modules may happen to be too far. And even modules are likely to
      be close enough as they are allocated below the kernel core and
      as close as possible of the kernel text.
      
      static_call selftest is running successfully with this change.
      
      With this patch, __do_irq() has the following sequence to trace
      irq entries:
      
      	c0004a00 <__SCT__tp_func_irq_entry>:
      	c0004a00:	48 00 00 e0 	b       c0004ae0 <__traceiter_irq_entry>
      	c0004a04:	3d 80 c0 00 	lis     r12,-16384
      	c0004a08:	81 8c 4a 1c 	lwz     r12,18972(r12)
      	c0004a0c:	7d 89 03 a6 	mtctr   r12
      	c0004a10:	4e 80 04 20 	bctr
      	c0004a14:	38 60 00 00 	li      r3,0
      	c0004a18:	4e 80 00 20 	blr
      	c0004a1c:	00 00 00 00 	.long 0x0
      ...
      	c0005654 <__do_irq>:
      ...
      	c0005664:	7c 7f 1b 78 	mr      r31,r3
      ...
      	c00056a0:	81 22 00 00 	lwz     r9,0(r2)
      	c00056a4:	39 29 00 01 	addi    r9,r9,1
      	c00056a8:	91 22 00 00 	stw     r9,0(r2)
      	c00056ac:	3d 20 c0 af 	lis     r9,-16209
      	c00056b0:	81 29 74 cc 	lwz     r9,29900(r9)
      	c00056b4:	2c 09 00 00 	cmpwi   r9,0
      	c00056b8:	41 82 00 10 	beq     c00056c8 <__do_irq+0x74>
      	c00056bc:	80 69 00 04 	lwz     r3,4(r9)
      	c00056c0:	7f e4 fb 78 	mr      r4,r31
      	c00056c4:	4b ff f3 3d 	bl      c0004a00 <__SCT__tp_func_irq_entry>
      
      Before this patch, __do_irq() was doing the following to trace irq
      entries:
      
      	c0005700 <__do_irq>:
      ...
      	c0005710:	7c 7e 1b 78 	mr      r30,r3
      ...
      	c000574c:	93 e1 00 0c 	stw     r31,12(r1)
      	c0005750:	81 22 00 00 	lwz     r9,0(r2)
      	c0005754:	39 29 00 01 	addi    r9,r9,1
      	c0005758:	91 22 00 00 	stw     r9,0(r2)
      	c000575c:	3d 20 c0 af 	lis     r9,-16209
      	c0005760:	83 e9 f4 cc 	lwz     r31,-2868(r9)
      	c0005764:	2c 1f 00 00 	cmpwi   r31,0
      	c0005768:	41 82 00 24 	beq     c000578c <__do_irq+0x8c>
      	c000576c:	81 3f 00 00 	lwz     r9,0(r31)
      	c0005770:	80 7f 00 04 	lwz     r3,4(r31)
      	c0005774:	7d 29 03 a6 	mtctr   r9
      	c0005778:	7f c4 f3 78 	mr      r4,r30
      	c000577c:	4e 80 04 21 	bctrl
      	c0005780:	85 3f 00 0c 	lwzu    r9,12(r31)
      	c0005784:	2c 09 00 00 	cmpwi   r9,0
      	c0005788:	40 82 ff e4 	bne     c000576c <__do_irq+0x6c>
      
      Behind the fact of now using a direct 'bl' instead of a
      'load/mtctr/bctr' sequence, we can also see that we get one less
      register on the stack.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
      5c810ced
    • Christophe Leroy's avatar
      powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC · 566af8cd
      Christophe Leroy authored
      Commit e65e1fc2 ("[PATCH] syscall class hookup for all normal
      targets") added generic support for AUDIT but that didn't include
      support for bi-arch like powerpc.
      
      Commit 4b588411 ("audit: Add generic compat syscall support")
      added generic support for bi-arch.
      
      Convert powerpc to that bi-arch generic audit support.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/a4b3951d1191d4183d92a07a6097566bde60d00a.1629812058.git.christophe.leroy@csgroup.eu
      566af8cd
    • Christophe Leroy's avatar
      powerpc/fsl_booke: Enable STRICT_KERNEL_RWX · 49e3d8ea
      Christophe Leroy authored
      Enable STRICT_KERNEL_RWX on fsl_booke.
      
      For that, we need additional TLBCAMs dedicated to linear mapping,
      based on the alignment of _sinittext.
      
      By default, up to 768 Mbytes of memory are mapped.
      It uses 3 TLBCAMs of size 256 Mbytes.
      
      With a data alignment of 16, we need up to 9 TLBCAMs:
        16/16/16/16/64/64/64/256/256
      
      With a data alignment of 4, we need up to 12 TLBCAMs:
        4/4/4/4/16/16/16/64/64/64/256/256
      
      With a data alignment of 1, we need up to 15 TLBCAMs:
        1/1/1/1/4/4/4/16/16/16/64/64/64/256/256
      
      By default, set a 16 Mbytes alignment as a compromise between memory
      usage and number of TLBCAMs. This can be adjusted manually when needed.
      
      For the time being, it doens't work when the base is randomised.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/29f9e5d2bbbc83ae9ca879265426a6278bf4d5bb.1634292136.git.christophe.leroy@csgroup.eu
      49e3d8ea
    • Christophe Leroy's avatar
      powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE · 68b44f94
      Christophe Leroy authored
      fsl_booke and 44x are not able to map kernel linear memory with
      pages, so they can't support DEBUG_PAGEALLOC and KFENCE, and
      STRICT_KERNEL_RWX is also a problem for now.
      
      Enable those only on book3s (both 32 and 64 except KFENCE), 8xx and 40x.
      
      Fixes: 88df6e90 ("[POWERPC] DEBUG_PAGEALLOC for 32-bit")
      Fixes: 95902e6c ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
      Fixes: 90cbac0e ("powerpc: Enable KFENCE for PPC32")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/d1ad9fdd9b27da3fdfa16510bb542ed51fa6e134.1634292136.git.christophe.leroy@csgroup.eu
      68b44f94
  28. 25 Aug, 2021 1 commit
  29. 16 Aug, 2021 1 commit
  30. 30 Jul, 2021 2 commits