1. 11 Feb, 2021 27 commits
  2. 08 Feb, 2021 13 commits
    • Michael Ellerman's avatar
      powerpc/64s: Handle program checks in wrong endian during early boot · e7eb9190
      Michael Ellerman authored
      There's a short window during boot where although the kernel is
      running little endian, any exceptions will cause the CPU to switch
      back to big endian. This situation persists until we call
      configure_exceptions(), which calls either the hypervisor or OPAL to
      configure the CPU so that exceptions will be taken in little
      endian (via HID0[HILE]).
      
      We don't intend to take exceptions during early boot, but one way we
      sometimes do is via a WARN/BUG etc. Those all boil down to a trap
      instruction, which will cause a program check exception.
      
      The first instruction of the program check handler is an mtsprg, which
      when executed in the wrong endian is an lhzu with a ~3GB displacement
      from r3. The content of r3 is random, so that becomes a load from some
      random location, and depending on the system (installed RAM etc.) can
      easily lead to a checkstop, or an infinitely recursive page fault.
      That prevents whatever the WARN/BUG was complaining about being
      printed to the console, and the user just sees a dead system.
      
      We can fix it by having a trampoline at the beginning of the program
      check handler that detects we are in the wrong endian, and flips us
      back to the correct endian.
      
      We can't flip MSR[LE] using mtmsr (alas), so we have to use rfid. That
      requires backing up SRR0/1 as well as a GPR. To do that we use
      SPRG0/2/3 (SPRG1 is already used for the paca). SPRG3 is user
      readable, but this trampoline is only active very early in boot, and
      SPRG3 will be reinitialised in vdso_getcpu_init() before userspace
      starts.
      
      With this trampoline in place we can survive a WARN early in boot and
      print a stack trace, which is eventually printed to the console once
      the console is up, eg:
      
        [83565.758545] kexec_core: Starting new kernel
        [    0.000000] ------------[ cut here ]------------
        [    0.000000] static_key_enable_cpuslocked(): static key '0xc000000000ea6160' used before call to jump_label_init()
        [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xfc/0x120
        [    0.000000] Modules linked in:
        [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-gcc-8.2.0-dirty #618
        [    0.000000] NIP:  c0000000002fd46c LR: c0000000002fd468 CTR: c000000000170660
        [    0.000000] REGS: c000000001227940 TRAP: 0700   Not tainted  (5.10.0-gcc-8.2.0-dirty)
        [    0.000000] MSR:  9000000002823003 <SF,HV,VEC,VSX,FP,ME,RI,LE>  CR: 24882422  XER: 20040000
        [    0.000000] CFAR: 0000000000000730 IRQMASK: 1
        [    0.000000] GPR00: c0000000002fd468 c000000001227bd0 c000000001228300 0000000000000065
        [    0.000000] GPR04: 0000000000000001 0000000000000065 c0000000010cf970 000000000000000d
        [    0.000000] GPR08: 0000000000000000 0000000000000000 0000000000000000 c00000000122763f
        [    0.000000] GPR12: 0000000000002000 c000000000f8a980 0000000000000000 0000000000000000
        [    0.000000] GPR16: 0000000000000000 0000000000000000 c000000000f88c8e c000000000f88c9a
        [    0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        [    0.000000] GPR24: 0000000000000000 c000000000dea3a8 0000000000000000 c000000000f35114
        [    0.000000] GPR28: 0000002800000000 c000000000f88c9a c000000000f88c8e c000000000ea6160
        [    0.000000] NIP [c0000000002fd46c] static_key_enable_cpuslocked+0xfc/0x120
        [    0.000000] LR [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120
        [    0.000000] Call Trace:
        [    0.000000] [c000000001227bd0] [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120 (unreliable)
        [    0.000000] [c000000001227c40] [c0000000002fd4c0] static_key_enable+0x30/0x50
        [    0.000000] [c000000001227c70] [c000000000f6629c] early_page_poison_param+0x58/0x9c
        [    0.000000] [c000000001227cb0] [c000000000f351b8] do_early_param+0xa4/0x10c
        [    0.000000] [c000000001227d30] [c00000000011e020] parse_args+0x270/0x5e0
        [    0.000000] [c000000001227e20] [c000000000f35864] parse_early_options+0x48/0x5c
        [    0.000000] [c000000001227e40] [c000000000f358d0] parse_early_param+0x58/0x84
        [    0.000000] [c000000001227e70] [c000000000f3a368] early_init_devtree+0xc4/0x490
        [    0.000000] [c000000001227f10] [c000000000f3bca0] early_setup+0xc8/0x1c8
        [    0.000000] [c000000001227f90] [000000000000c320] 0xc320
        [    0.000000] Instruction dump:
        [    0.000000] 4bfffddd 7c2004ac 39200001 913f0000 4bffffb8 7c651b78 3c82ffac 3c62ffc0
        [    0.000000] 38841b00 3863f310 4bdf03a5 60000000 <0fe00000> 4bffff38 60000000 60000000
        [    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x80 with crng_init=0
        [    0.000000] ---[ end trace 0000000000000000 ]---
        [    0.000000] dt-cpu-ftrs: setup for ISA 3000
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210202130207.1303975-2-mpe@ellerman.id.au
      e7eb9190
    • Michael Ellerman's avatar
      powerpc/64: Make stack tracing work during very early boot · 0ecf6a9e
      Michael Ellerman authored
      If we try to stack trace very early during boot, either due to a
      WARN/BUG or manual dump_stack(), we will oops in
      valid_emergency_stack() when we try to dereference the paca_ptrs
      array.
      
      The fix is simple, we just return false if paca_ptrs isn't allocated
      yet. The stack pointer definitely isn't part of any emergency stack
      because we haven't allocated any yet.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210202130207.1303975-1-mpe@ellerman.id.au
      0ecf6a9e
    • Christopher M. Riedl's avatar
      powerpc64/idle: Fix SP offsets when saving GPRs · 73287caa
      Christopher M. Riedl authored
      The idle entry/exit code saves/restores GPRs in the stack "red zone"
      (Protected Zone according to PowerPC64 ELF ABI v2). However, the offset
      used for the first GPR is incorrect and overwrites the back chain - the
      Protected Zone actually starts below the current SP. In practice this is
      probably not an issue, but it's still incorrect so fix it.
      
      Also expand the comments to explain why using the stack "red zone"
      instead of creating a new stackframe is appropriate here.
      Signed-off-by: default avatarChristopher M. Riedl <cmr@codefail.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210206072342.5067-1-cmr@codefail.de
      73287caa
    • Christophe Leroy's avatar
      powerpc/32s: Allow constant folding in mtsr()/mfsr() · b842d131
      Christophe Leroy authored
      On the same way as we did in wrtee(), add an alternative
      using mtsr/mfsr instructions instead of mtsrin/mfsrin
      when the segment register can be determined at compile time.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/9baed0ff9d76723ec90f1b567ddd4ac1ecc7a190.1612612022.git.christophe.leroy@csgroup.eu
      b842d131
    • Christophe Leroy's avatar
      powerpc/32s: mfsrin()/mtsrin() become mfsr()/mtsr() · 179ae57d
      Christophe Leroy authored
      Function names should tell what the function does, not how.
      
      mfsrin() and mtsrin() are read/writing segment registers.
      
      They are called that way because they are using mfsrin and mtsrin
      instructions, but it doesn't matter for the caller.
      
      In preparation of following patch, change their name to mfsr() and mtsr()
      in order to make it obvious they manipulate segment registers without
      messing up with how they do it.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/f92d99f4349391b77766745900231aa880a0efb5.1612612022.git.christophe.leroy@csgroup.eu
      179ae57d
    • Christophe Leroy's avatar
      powerpc/32s: Change mfsrin() into a static inline function · fd659e8f
      Christophe Leroy authored
      mfsrin() is a macro.
      
      Change in into an inline function to avoid conflicts in KVM
      and make it more evolutive.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/72c7b9879e2e2e6f5c27dadda6486386c2b50f23.1612612022.git.christophe.leroy@csgroup.eu
      fd659e8f
    • Christophe Leroy's avatar
      powerpc/uaccess: Perform barrier_nospec() in KUAP allowance helpers · 8524e2e7
      Christophe Leroy authored
      barrier_nospec() in uaccess helpers is there to protect against
      speculative accesses around access_ok().
      
      When using user_access_begin() sequences together with
      unsafe_get_user() like macros, barrier_nospec() is called for
      every single read although we know the access_ok() is done
      onece.
      
      Since all user accesses must be granted by a call to either
      allow_read_from_user() or allow_read_write_user() which will
      always happen after the access_ok() check, move the barrier_nospec()
      there.
      Reported-by: default avatarChristopher M. Riedl <cmr@codefail.de>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/c72f014730823b413528e90ab6c4d3bcb79f8497.1612692067.git.christophe.leroy@csgroup.eu
      8524e2e7
    • Sandipan Das's avatar
      powerpc/sstep: Fix darn emulation · 22b89ba1
      Sandipan Das authored
      Commit 8813ff49 ("powerpc/sstep: Check instruction validity
      against ISA version before emulation") introduced a proper way to skip
      unknown instructions. This makes sure that the same is used for the
      darn instruction when the range selection bits have a reserved value.
      
      Fixes: a23987ef ("powerpc: sstep: Add support for darn instruction")
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210204080744.135785-2-sandipan@linux.ibm.com
      22b89ba1
    • Sandipan Das's avatar
      powerpc/sstep: Fix load-store and update emulation · bbda4b6c
      Sandipan Das authored
      The Power ISA says that the fixed-point load and update instructions
      must neither use R0 for the base address (RA) nor have the
      destination (RT) and the base address (RA) as the same register.
      Similarly, for fixed-point stores and floating-point loads and stores,
      the instruction is invalid when R0 is used as the base address (RA).
      
      This is applicable to the following instructions.
        * Load Byte and Zero with Update (lbzu)
        * Load Byte and Zero with Update Indexed (lbzux)
        * Load Halfword and Zero with Update (lhzu)
        * Load Halfword and Zero with Update Indexed (lhzux)
        * Load Halfword Algebraic with Update (lhau)
        * Load Halfword Algebraic with Update Indexed (lhaux)
        * Load Word and Zero with Update (lwzu)
        * Load Word and Zero with Update Indexed (lwzux)
        * Load Word Algebraic with Update Indexed (lwaux)
        * Load Doubleword with Update (ldu)
        * Load Doubleword with Update Indexed (ldux)
        * Load Floating Single with Update (lfsu)
        * Load Floating Single with Update Indexed (lfsux)
        * Load Floating Double with Update (lfdu)
        * Load Floating Double with Update Indexed (lfdux)
        * Store Byte with Update (stbu)
        * Store Byte with Update Indexed (stbux)
        * Store Halfword with Update (sthu)
        * Store Halfword with Update Indexed (sthux)
        * Store Word with Update (stwu)
        * Store Word with Update Indexed (stwux)
        * Store Doubleword with Update (stdu)
        * Store Doubleword with Update Indexed (stdux)
        * Store Floating Single with Update (stfsu)
        * Store Floating Single with Update Indexed (stfsux)
        * Store Floating Double with Update (stfdu)
        * Store Floating Double with Update Indexed (stfdux)
      
      E.g. the following behaviour is observed for an invalid load and
      update instruction having RA = RT.
      
      While a userspace program having an instruction word like 0xe9ce0001,
      i.e. ldu r14, 0(r14), runs without getting receiving a SIGILL on a
      Power system (observed on P8 and P9), the outcome of executing that
      instruction word varies and its behaviour can be considered to be
      undefined.
      
      Attaching an uprobe at that instruction's address results in emulation
      which currently performs the load as well as writes the effective
      address back to the base register. This might not match the outcome
      from hardware.
      
      To remove any inconsistencies, this adds additional checks for the
      aforementioned instructions to make sure that the emulation
      infrastructure treats them as unknown. The kernel can then fallback to
      executing such instructions on hardware.
      
      Fixes: 0016a4cf ("powerpc: Emulate most Book I instructions in emulate_step()")
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Reviewed-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210204080744.135785-1-sandipan@linux.ibm.com
      bbda4b6c
    • Christophe Leroy's avatar
      powerpc/8xx: Fix software emulation interrupt · 903178d0
      Christophe Leroy authored
      For unimplemented instructions or unimplemented SPRs, the 8xx triggers
      a "Software Emulation Exception" (0x1000). That interrupt doesn't set
      reason bits in SRR1 as the "Program Check Exception" does.
      
      Go through emulation_assist_interrupt() to set REASON_ILLEGAL.
      
      Fixes: fbbcc3bb ("powerpc/8xx: Remove SoftwareEmulation()")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/ad782af87a222efc79cfb06079b0fd23d4224eaf.1612515180.git.christophe.leroy@csgroup.eu
      903178d0
    • Athira Rajeev's avatar
      powerpc/perf: Record counter overflow always if SAMPLE_IP is unset · d137845c
      Athira Rajeev authored
      While sampling for marked events, currently we record the sample only
      if the SIAR valid bit of Sampled Instruction Event Register (SIER) is
      set. SIAR_VALID bit is used for fetching the instruction address from
      Sampled Instruction Address Register(SIAR). But there are some
      usecases, where the user is interested only in the PMU stats at each
      counter overflow and the exact IP of the overflow event is not
      required. Dropping SIAR invalid samples will fail to record some of
      the counter overflows in such cases.
      
      Example of such usecase is dumping the PMU stats (event counts) after
      some regular amount of instructions/events from the userspace (ex: via
      ptrace). Here counter overflow is indicated to userspace via signal
      handler, and captured by monitoring and enabling I/O signaling on the
      event file descriptor. In these cases, we expect to get
      sample/overflow indication after each specified sample_period.
      
      Perf event attribute will not have PERF_SAMPLE_IP set in the
      sample_type if exact IP of the overflow event is not requested. So
      while profiling if SAMPLE_IP is not set, just record the counter
      overflow irrespective of SIAR_VALID check.
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      [mpe: Reflow comment and if formatting]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
      d137845c
    • Nathan Lynch's avatar
      powerpc/pseries/dlpar: handle ibm, configure-connector delay status · 768d70e1
      Nathan Lynch authored
      dlpar_configure_connector() has two problems in its handling of
      ibm,configure-connector's return status:
      
      1. When the status is -2 (busy, call again), we call
         ibm,configure-connector again immediately without checking whether
         to schedule, which can result in monopolizing the CPU.
      2. Extended delay status (9900..9905) goes completely unhandled,
         causing the configuration to unnecessarily terminate.
      
      Fix both of these issues by using rtas_busy_delay().
      
      Fixes: ab519a01 ("powerpc/pseries: Kernel DLPAR Infrastructure")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarTyrel Datwyler <tyreld@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210107025900.410369-1-nathanl@linux.ibm.com
      768d70e1
    • Nicholas Piggin's avatar
      powerpc/64s: Implement ptep_clear_flush_young that does not flush TLBs · 3cb1aa7a
      Nicholas Piggin authored
      Similarly to the x86 commit b13b1d2d ("x86/mm: In the PTE swapout
      page reclaim case clear the accessed bit instead of flushing the TLB"),
      implement ptep_clear_flush_young that does not actually flush the TLB
      in the case the referenced bit is cleared.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20201217134731.488135-8-npiggin@gmail.com
      3cb1aa7a