1. 06 Jan, 2022 1 commit
    • Ard Biesheuvel's avatar
      ARM: 9176/1: avoid literal references in inline assembly · 5fe41793
      Ard Biesheuvel authored
      Nathan reports that the new get_current() and per-CPU offset accessors
      may cause problems at build time due to the use of a literal to hold the
      address of the respective variables. This is due to the fact that LLD
      before v14 does not support the PC-relative group relocations that are
      normally used for this, and the fallback relies on literals but does not
      emit the literal pools explictly using the .ltorg directive.
      
      ./arch/arm/include/asm/current.h:53:6: error: out of range pc-relative fixup value
              asm(LOAD_SYM_ARMV6(%0, __current) : "=r"(cur));
                  ^
      ./arch/arm/include/asm/insn.h:25:2: note: expanded from macro 'LOAD_SYM_ARMV6'
              "       ldr     " #reg ", =" #sym "                     nt"
              ^
      <inline asm>:1:3: note: instantiated into assembly here
                      ldr     r0, =__current
                      ^
      
      Since emitting a literal pool in this particular case is not possible,
      let's avoid the LOAD_SYM_ARMV6() entirely, and use the ordinary C
      assigment instead.
      
      As it turns out, there are other such cases, and here, using .ltorg to
      emit the literal pool within range of the LDR instruction would be
      possible due to the presence of an unconditional branch right after it.
      Unfortunately, putting .ltorg directives in subsections appears to
      confuse the Clang inline assembler, resulting in similar errors even
      though the .ltorg is most definitely within range.
      
      So let's fix this by emitting the literal explicitly, and not rely on
      the assembler to figure this out. This means we have move the fallback
      out of the LOAD_SYM_ARMV6() macro and into the callers.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1551
      
      Fixes: 9c46929e ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")
      Reported-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      5fe41793
  2. 05 Jan, 2022 1 commit
  3. 17 Dec, 2021 1 commit
  4. 06 Dec, 2021 13 commits
  5. 03 Dec, 2021 20 commits
    • Arnd Bergmann's avatar
      ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER · c1fe8d05
      Arnd Bergmann authored
      This is one of the last platforms using the old entry path.
      While this code path is spread over a few files, it is fairly
      straightforward to convert it into an equivalent C version,
      leaving the existing algorithm and all the priority handling
      the same.
      
      Unlike most irqchip drivers, this means reading the status
      register(s) in a loop and always handling the highest-priority
      irq first.
      
      The IOMD_IRQREQC and IOMD_IRQREQD registers are not actaully
      used here, but I left the code in place for the time being,
      to keep the conversion as direct as possible. It could be
      removed in a cleanup on top.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      [ardb: drop obsolete IOMD_IRQREQC/IOMD_IRQREQD handling]
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      c1fe8d05
    • Ard Biesheuvel's avatar
      ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups · d60ff2e7
      Ard Biesheuvel authored
      IOMD_IRQREQC nor IOMD_IRQREQD are ever defined, so any conditionally
      compiled code that depends on them is dead code, and can be removed.
      Suggested-by: default avatarRussell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      d60ff2e7
    • Ard Biesheuvel's avatar
      ARM: implement support for vmap'ed stacks · a1c510d0
      Ard Biesheuvel authored
      Wire up the generic support for managing task stack allocations via vmalloc,
      and implement the entry code that detects whether we faulted because of a
      stack overrun (or future stack overrun caused by pushing the pt_regs array)
      
      While this adds a fair amount of tricky entry asm code, it should be
      noted that it only adds a TST + branch to the svc_entry path. The code
      implementing the non-trivial handling of the overflow stack is emitted
      out-of-line into the .text section.
      
      Since on ARM, we rely on do_translation_fault() to keep PMD level page
      table entries that cover the vmalloc region up to date, we need to
      ensure that we don't hit such a stale PMD entry when accessing the
      stack. So we do a dummy read from the new stack while still running from
      the old one on the context switch path, and bump the vmalloc_seq counter
      when PMD level entries in the vmalloc range are modified, so that the MM
      switch fetches the latest version of the entries.
      
      Note that we need to increase the per-mode stack by 1 word, to gain some
      space to stash a GPR until we know it is safe to touch the stack.
      However, due to the cacheline alignment of the struct, this does not
      actually increase the memory footprint of the struct stack array at all.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      a1c510d0
    • Ard Biesheuvel's avatar
      ARM: entry: rework stack realignment code in svc_entry · ae5cc07d
      Ard Biesheuvel authored
      The original Thumb-2 enablement patches updated the stack realignment
      code in svc_entry to work around the lack of a STMIB instruction in
      Thumb-2, by subtracting 4 from the frame size, inverting the sense of
      the misaligment check, and changing to a STMIA instruction and a final
      stack push of a 4 byte quantity that results in the stack becoming
      aligned at the end of the sequence. It also pushes and pops R0 to the
      stack in order to have a temp register that Thumb-2 allows in general
      purpose ALU instructions, as TST using SP is not permitted.
      
      Both are a bit problematic for vmap'ed stacks, as using the stack is
      only permitted after we decide that we did not overflow the stack, or
      have already switched to the overflow stack.
      
      As for the alignment check: the current approach creates a corner case
      where, if the initial SUB of SP ends up right at the start of the stack,
      we will end up subtracting another 8 bytes and overflowing it.  This
      means we would need to add the overflow check *after* the SUB that
      deliberately misaligns the stack. However, this would require us to keep
      local state (i.e., whether we performed the subtract or not) across the
      overflow check, but without any GPRs or stack available.
      
      So let's switch to an approach where we don't use the stack, and where
      the alignment check of the stack pointer occurs in the usual way, as
      this is guaranteed not to result in overflow. This means we will be able
      to do the overflow check first.
      
      While at it, switch to R1 so the mode stack pointer in R0 remains
      accessible.
      Acked-by: default avatarNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      ae5cc07d
    • Ard Biesheuvel's avatar
      ARM: switch_to: clean up Thumb2 code path · b832faec
      Ard Biesheuvel authored
      The load-multiple instruction that essentially performs the switch_to
      operation in ARM mode, by loading all callee save registers as well the
      stack pointer and the program counter, is split into 3 separate loads
      for Thumb-2, with the IP register used as a temporary to capture the
      value of R4 before it gets overwritten.
      
      We can clean this up a bit, by sticking with a single LDMIA instruction,
      but one that pops SP and PC into IP and LR, respectively, and by using
      ordinary move register and branch instructions to get those values into
      SP and PC. This also allows us to move the set_current call closer to
      the assignment of SP, reducing the window where those are mutually out
      of sync. This is especially relevant for CONFIG_VMAP_STACK, which is
      being introduced in a subsequent patch, where we need to issue a load
      that might fault from the new stack while running from the old one, to
      ensure that stale PMD entries in the VMALLOC space are synced up.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      b832faec
    • Ard Biesheuvel's avatar
      ARM: unwind: disregard unwind info before stack frame is set up · 532319b9
      Ard Biesheuvel authored
      When unwinding the stack from a stack overflow, we are likely to start
      from a stack push instruction, given that this is the most common way to
      grow the stack for compiler emitted code. This push instruction rarely
      appears anywhere else than at offset 0x0 of the function, and if it
      doesn't, the compiler tends to split up the unwind annotations, given
      that the stack frame layout is apparently not the same throughout the
      function.
      
      This means that, in the general case, if the frame's PC points at the
      first instruction covered by a certain unwind entry, there is no way the
      stack frame that the unwind entry describes could have been created yet,
      and so we are still on the stack frame of the caller in that case. So
      treat this as a special case, and return with the new PC taken from the
      frame's LR, without applying the unwind transformations to the virtual
      register set.
      
      This permits us to unwind the call stack on stack overflow when the
      overflow was caused by a stack push on function entry.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      532319b9
    • Ard Biesheuvel's avatar
      ARM: memset: clean up unwind annotations · ad3d09b5
      Ard Biesheuvel authored
      The memset implementation carves up the code in different sections, each
      covered with their own unwind info. In this case, it is done in a way
      similar to how the compiler might do it, to disambiguate between parts
      where the return address is in LR and the SP is unmodified, and parts
      where a stack frame is live, and the unwinder needs to know the size of
      the stack frame and the location of the return address within it.
      
      Only the placement of the unwind directives is slightly odd: the stack
      pushes are placed in the wrong sections, which may confuse the unwinder
      when attempting to unwind with PC pointing at the stack push in
      question.
      
      So let's fix this up, by reordering the directives and instructions as
      appropriate.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      ad3d09b5
    • Ard Biesheuvel's avatar
      ARM: memmove: use frame pointer as unwind anchor · ccb81601
      Ard Biesheuvel authored
      The memmove routine is a bit unusual in the way it manages the stack
      pointer: depending on the execution path through the function, the SP
      assumes different values as different subsets of the register file are
      preserved and restored again. This is problematic when it comes to EHABI
      unwind info, as it is not instruction accurate, and does not allow
      tracking the SP value as it changes.
      
      Commit 207a6cb0 ("ARM: 8224/1: Add unwinding support for memmove
      function") addressed this by carving up the function in different chunks
      as far as the unwinder is concerned, and keeping a set of unwind
      directives for each of them, each corresponding with the state of the
      stack pointer during execution of the chunk in question. This not only
      duplicates unwind info unnecessarily, but it also complicates unwinding
      the stack upon overflow.
      
      Instead, let's do what the compiler does when the SP is updated halfway
      through a function, which is to use a frame pointer and emit the
      appropriate unwind directives to communicate this to the unwinder.
      
      Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
      avoid touching R7 in the body of the function, so that Thumb-2 can use
      it as the frame pointer. R11 was not modified in the first place.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      ccb81601
    • Ard Biesheuvel's avatar
      ARM: memcpy: use frame pointer as unwind anchor · ba999a04
      Ard Biesheuvel authored
      The memcpy template is a bit unusual in the way it manages the stack
      pointer: depending on the execution path through the function, the SP
      assumes different values as different subsets of the register file are
      preserved and restored again. This is problematic when it comes to EHABI
      unwind info, as it is not instruction accurate, and does not allow
      tracking the SP value as it changes.
      
      Commit 279f487e ("ARM: 8225/1: Add unwinding support for memory
      copy functions") addressed this by carving up the function in different
      chunks as far as the unwinder is concerned, and keeping a set of unwind
      directives for each of them, each corresponding with the state of the
      stack pointer during execution of the chunk in question. This not only
      duplicates unwind info unnecessarily, but it also complicates unwinding
      the stack upon overflow.
      
      Instead, let's do what the compiler does when the SP is updated halfway
      through a function, which is to use a frame pointer and emit the
      appropriate unwind directives to communicate this to the unwinder.
      
      Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
      avoid touching R7 in the body of the template, so that Thumb-2 can use
      it as the frame pointer. R11 was not modified in the first place.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      ba999a04
    • Ard Biesheuvel's avatar
      ARM: run softirqs on the per-CPU IRQ stack · 9974f857
      Ard Biesheuvel authored
      Now that we have enabled IRQ stacks, any softIRQs that are handled over
      the back of a hard IRQ will run from the IRQ stack as well. However, any
      synchronous softirq processing that happens when re-enabling softIRQs
      from task context will still execute on that task's stack.
      
      Since any call to local_bh_enable() at any level in the task's call
      stack may trigger a softIRQ processing run, which could potentially
      cause a task stack overflow if the combined stack footprints exceed the
      stack's size, let's run these synchronous invocations of do_softirq() on
      the IRQ stack as well.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      9974f857
    • Ard Biesheuvel's avatar
      ARM: call_with_stack: add unwind support · 0b78f2e9
      Ard Biesheuvel authored
      Restructure the code and add the unwind annotations so that both the
      frame pointer unwinder as well as the EHABI unwind info based unwinder
      will be able to follow the call stack through call_with_stack().
      
      Since GCC and Clang use different formats for the stack frame, two
      methods are implemented: a GCC version that pushes fp, sp, lr and pc for
      compatibility with the frame pointer unwinder, and a second version that
      works with Clang, as well as with the EHABI unwinder both in ARM and
      Thumb2 modes.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      0b78f2e9
    • Ard Biesheuvel's avatar
      ARM: implement IRQ stacks · d4664b6c
      Ard Biesheuvel authored
      Now that we no longer rely on the stack pointer to access the current
      task struct or thread info, we can implement support for IRQ stacks
      cleanly as well.
      
      Define a per-CPU IRQ stack and switch to this stack when taking an IRQ,
      provided that we were not already using that stack in the interrupted
      context. This is never the case for IRQs taken from user space, but ones
      taken while running in the kernel could fire while one taken from user
      space has not completed yet.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      d4664b6c
    • Ard Biesheuvel's avatar
      ARM: backtrace-clang: avoid crash on bogus frame pointer · eae9523f
      Ard Biesheuvel authored
      The Clang backtrace code dereferences the link register value pulled
      from the stack to decide whether the caller was a branch-and-link
      instruction, in order to subsequently decode the offset to find the
      start of the calling function. Unlike other loads in this routine, this
      one is not protected by a fixup, and may therefore cause a crash if the
      address in question is bogus.
      
      So let's fix this, by treating the fault as a failure to decode the 'bl'
      instruction. To avoid a label renum, reuse a fixup label that guards an
      instruction that cannot fault to begin with.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      eae9523f
    • Ard Biesheuvel's avatar
      ARM: unwind: dump exception stack from calling frame · 4ab68270
      Ard Biesheuvel authored
      The existing code that dumps the contents of the pt_regs structure
      passed to __entry routines does so while unwinding the callee frame, and
      dereferences the stack pointer as a struct pt_regs*. This will no longer
      work when we enable support for IRQ or overflow stacks, because the
      struct pt_regs may live on the task stack, while we are executing from
      another stack.
      
      The unwinder has access to this information, but only while unwinding
      the calling frame. So let's combine the exception stack dumping code
      with the handling of the calling frame as well. By printing it before
      dumping the caller/callee addresses, the output order is preserved.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      4ab68270
    • Ard Biesheuvel's avatar
      ARM: export dump_mem() to other objects · 8cdfdf7f
      Ard Biesheuvel authored
      The unwind info based stack unwinder will make its own call to
      dump_mem() to dump the exception stack, so give it external linkage.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      8cdfdf7f
    • Ard Biesheuvel's avatar
      ARM: unwind: support unwinding across multiple stacks · b6506981
      Ard Biesheuvel authored
      Implement support in the unwinder for dealing with multiple stacks.
      This will be needed once we add support for IRQ stacks, or for the
      overflow stack used by the vmap'ed stacks code.
      
      This involves tracking the unwind opcodes that either update the virtual
      stack pointer from another virtual register, or perform an explicit
      subtract on the virtual stack pointer, and updating the low and high
      bounds that we use to sanitize the stack pointer accordingly.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      b6506981
    • Ard Biesheuvel's avatar
      ARM: assembler: introduce bl_r macro · b3ab60b1
      Ard Biesheuvel authored
      Add a bl_r macro that abstract the difference between the ways indirect
      calls are performed on older and newer ARM architecture revisions.
      
      The main difference is to prefer blx instructions over explicit LR
      assignments when possible, as these tend to confuse the prediction logic
      in out-of-order cores when speculating across a function return.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      b3ab60b1
    • Ard Biesheuvel's avatar
      ARM: remove some dead code · 08572cd4
      Ard Biesheuvel authored
      This code appears to be no longer used so let's get rid of it.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarKeith Packard <keithpac@amazon.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      08572cd4
    • Ard Biesheuvel's avatar
      ARM: stackprotector: prefer compiler for TLS based per-task protector · f05eb1d2
      Ard Biesheuvel authored
      Currently, we implement the per-task stack protector for ARM using a GCC
      plugin, due to lack of native compiler support. However, work is
      underway to get this implemented in the compiler, which means we will be
      able to deprecate the GCC plugin at some point.
      
      In the meantime, we will need to support both, where the native compiler
      implementation is obviously preferred. So let's wire this up in Kconfig
      and the Makefile.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
      f05eb1d2
    • Ard Biesheuvel's avatar
      ARM: decompressor: disable stack protector · 672513bf
      Ard Biesheuvel authored
      Enabling the stack protector in the decompressor is of dubious value,
      given that it uses a fixed value for the canary, cannot print any output
      unless CONFIG_DEBUG_LL is enabled (which relies on board specific build
      time settings), and is already disabled for a good chunk of the code
      (libfdt).
      
      So let's just disable it in the decompressor. This will make it easier
      in the future to manage the command line options that would need to be
      removed again in this context for the TLS register based stack
      protector.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      672513bf
  6. 14 Nov, 2021 4 commits