1. 11 Feb, 2021 11 commits
    • Christophe Leroy's avatar
      powerpc/32: Preserve cr1 in exception prolog stack check to fix build error · 3642eb21
      Christophe Leroy authored
      THREAD_ALIGN_SHIFT = THREAD_SHIFT + 1 = PAGE_SHIFT + 1
      Maximum PAGE_SHIFT is 18 for 256k pages so
      THREAD_ALIGN_SHIFT is 19 at the maximum.
      
      No need to clobber cr1, it can be preserved when moving r1
      into CR when we check stack overflow.
      
      This reduces the number of instructions in Machine Check Exception
      prolog and fixes a build failure reported by the kernel test robot
      on v5.10 stable when building with RTAS + VMAP_STACK + KVM. That
      build failure is due to too many instructions in the prolog hence
      not fitting between 0x200 and 0x300. Allthough the problem doesn't
      show up in mainline, it is still worth the change.
      
      Fixes: 98bf2d3f ("powerpc/32s: Fix RTAS machine check with VMAP stack")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/5ae4d545e3ac58e133d2599e0deb88843cb494fc.1612768623.git.christophe.leroy@csgroup.eu
      3642eb21
    • Nicholas Piggin's avatar
      powerpc/64s: Remove EXSLB interrupt save area · ac7c5e9b
      Nicholas Piggin authored
      SLB faults should not be taken while the PACA save areas are live, all
      memory accesses should be fetches from the kernel text, and access to
      PACA and the current stack, before C code is called or any other
      accesses are made.
      
      All of these have pinned SLBs so will not take a SLB fault. Therefore
      EXSLB is not be required.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210208063406.331655-1-npiggin@gmail.com
      ac7c5e9b
    • Nicholas Piggin's avatar
      powerpc/64s: syscall real mode entry use mtmsrd rather than rfid · 14ad0e7d
      Nicholas Piggin authored
      Have the real mode system call entry handler branch to the kernel
      0xc000... address and then use mtmsrd to enable the MMU, rather than use
      SRRs and rfid.
      
      Commit 8729c26e ("powerpc/64s/exception: Move real to virt switch
      into the common handler") implemented this style of real mode entry for
      other interrupt handlers, so this brings system calls into line with
      them, which is the main motivcation for the change.
      
      This tends to be slightly faster due to avoiding the mtsprs, and it also
      does not clobber the SRR registers, which becomes important in a
      subsequent change. The real mode entry points don't tend to be too
      important for performance these days, but it is possible for a
      hypervisor to run guests in AIL=0 mode for certian reasons.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210208063326.331502-1-npiggin@gmail.com
      14ad0e7d
    • Alexey Kardashevskiy's avatar
      powerpc/kuap: Restore AMR after replaying soft interrupts · 60a707d0
      Alexey Kardashevskiy authored
      Since de78a9c4 ("powerpc: Add a framework for Kernel Userspace
      Access Protection"), user access helpers call user_{read|write}_access_{begin|end}
      when user space access is allowed.
      
      Commit 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU") made
      the mentioned helpers program a AMR special register to allow such
      access for a short period of time, most of the time AMR is expected to
      block user memory access by the kernel.
      
      Since the code accesses the user space memory, unsafe_get_user() calls
      might_fault() which calls arch_local_irq_restore() if either
      CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
      arch_local_irq_restore() then attempts to replay pending soft
      interrupts as KUAP regions have hardware interrupts enabled.
      
      If a pending interrupt happens to do user access (performance
      interrupts do that), it enables access for a short period of time so
      after returning from the replay, the user access state remains blocked
      and if a user page fault happens - "Bug: Read fault blocked by AMR!"
      appears and SIGSEGV is sent.
      
      An example trace:
        Bug: Read fault blocked by AMR!
        WARNING: CPU: 0 PID: 1603 at /home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145
        CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24
        NIP:  c00000000009ece8 LR: c00000000009ece4 CTR: 0000000000000000
        REGS: c00000000dc63560 TRAP: 0700   Not tainted  (5.10.0-rc6_v5.10-rc6_a+fstn1)
        MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28002888  XER: 20040000
        CFAR: c0000000001fa928 IRQMASK: 1
        GPR00: c00000000009ece4 c00000000dc637f0 c000000002397600 000000000000001f
        GPR04: c0000000020eb318 0000000000000000 c00000000dc63494 0000000000000027
        GPR08: c00000007fe4de68 c00000000dfe9180 0000000000000000 0000000000000001
        GPR12: 0000000000002000 c0000000030a0000 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 bfffffffffffffff
        GPR20: 0000000000000000 c0000000134a4020 c0000000019c2218 0000000000000fe0
        GPR24: 0000000000000000 0000000000000000 c00000000d106200 0000000040000000
        GPR28: 0000000000000000 0000000000000300 c00000000dc63910 c000000001946730
        NIP __do_page_fault+0xb38/0xde0
        LR  __do_page_fault+0xb34/0xde0
        Call Trace:
          __do_page_fault+0xb34/0xde0 (unreliable)
          handle_page_fault+0x10/0x2c
        --- interrupt: 300 at strncpy_from_user+0x290/0x440
            LR = strncpy_from_user+0x284/0x440
          strncpy_from_user+0x2f0/0x440 (unreliable)
          getname_flags+0x88/0x2c0
          do_sys_openat2+0x2d4/0x5f0
          do_sys_open+0xcc/0x140
          system_call_exception+0x160/0x240
          system_call_common+0xf0/0x27c
      
      To fix it save/restore the AMR when replaying interrupts, and also
      add a check if AMR was not blocked prior to replaying interrupts.
      
      Originally found by syzkaller.
      
      Fixes: 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU")
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Use normal commit citation format and add full oops log to
            change log, move kuap_check_amr() into the restore routine to
            avoid warnings about unreconciled IRQ state]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210202091541.36499-1-aik@ozlabs.ru
      60a707d0
    • Alexey Kardashevskiy's avatar
      powerpc/uaccess: Avoid might_fault() when user access is enabled · 7d506ca9
      Alexey Kardashevskiy authored
      The amount of code executed with enabled user space access (unlocked
      KUAP) should be minimal. However with CONFIG_PROVE_LOCKING or
      CONFIG_DEBUG_ATOMIC_SLEEP enabled, might_fault() calls into various
      parts of the kernel, and may even end up replaying interrupts which in
      turn may access user space and forget to restore the KUAP state.
      
      The problem places are:
        1. strncpy_from_user (and similar) which unlock KUAP and call
           unsafe_get_user -> __get_user_allowed -> __get_user_nocheck()
           with do_allow=false to skip KUAP as the caller took care of it.
        2. __unsafe_put_user_goto() which is called with unlocked KUAP.
      
      eg:
        WARNING: CPU: 30 PID: 1 at arch/powerpc/include/asm/book3s/64/kup.h:324 arch_local_irq_restore+0x160/0x190
        NIP arch_local_irq_restore+0x160/0x190
        LR  lock_is_held_type+0x140/0x200
        Call Trace:
          0xc00000007f392ff8 (unreliable)
          ___might_sleep+0x180/0x320
          __might_fault+0x50/0xe0
          filldir64+0x2d0/0x5d0
          call_filldir+0xc8/0x180
          ext4_readdir+0x948/0xb40
          iterate_dir+0x1ec/0x240
          sys_getdents64+0x80/0x290
          system_call_exception+0x160/0x280
          system_call_common+0xf0/0x27c
      
      Change __get_user_nocheck() to look at `do_allow` to decide whether to
      skip might_fault(). Since strncpy_from_user/etc call might_fault()
      anyway before unlocking KUAP, there should be no visible change.
      
      Drop might_fault() in __unsafe_put_user_goto() as it is only called
      from unsafe_put_user(), which already has KUAP unlocked.
      
      Since keeping might_fault() is still desirable for debugging, add
      calls to it in user_[read|write]_access_begin(). That also allows us
      to drop the is_kernel_addr() test, because there should be no code
      using user_[read|write]_access_begin() in order to access a kernel
      address.
      
      Fixes: de78a9c4 ("powerpc: Add a framework for Kernel Userspace Access Protection")
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      [mpe: Combine with related patch from myself, merge change logs]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210204121612.32721-1-aik@ozlabs.ru
      7d506ca9
    • Michael Ellerman's avatar
      powerpc/uaccess: Simplify unsafe_put_user() implementation · de4ffc65
      Michael Ellerman authored
      Currently unsafe_put_user() expands to __put_user_goto(), which
      expands to __put_user_nocheck_goto().
      
      There are no other uses of __put_user_nocheck_goto(), and although
      there are some other uses of __put_user_goto() those could just use
      unsafe_put_user().
      
      Every layer of indirection introduces the possibility that some code
      is calling that layer, and makes keeping track of the required
      semantics at each point more complicated.
      
      So drop __put_user_goto(), and rename __put_user_nocheck_goto() to
      __unsafe_put_user_goto(). The "nocheck" is implied by "unsafe".
      
      Replace the few uses of __put_user_goto() with unsafe_put_user().
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210208135717.2618798-1-mpe@ellerman.id.au
      de4ffc65
    • Michael Ellerman's avatar
      powerpc/amigaone: Make amigaone_discover_phbs() static · f30520c6
      Michael Ellerman authored
      It's only used in setup.c, so make it static.
      
      Fixes: 053d58c8 ("powerpc/amigaone: Move PHB discovery")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210210130804.3190952-3-mpe@ellerman.id.au
      f30520c6
    • Michael Ellerman's avatar
      powerpc/mm/64s: Fix no previous prototype warning · 2bb421a3
      Michael Ellerman authored
      As reported by lkp:
      
        arch/powerpc/mm/book3s64/radix_tlb.c:646:6: warning: no previous
        prototype for function 'exit_lazy_flush_tlb'
      
      Fix it by moving the prototype into the existing header.
      
      Fixes: 032b7f08 ("powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210210130804.3190952-2-mpe@ellerman.id.au
      2bb421a3
    • Michael Ellerman's avatar
      powerpc/83xx: Fix build error when CONFIG_PCI=n · 5c47c44f
      Michael Ellerman authored
      As reported by lkp:
      
        arch/powerpc/platforms/83xx/km83xx.c:183:19: error: 'mpc83xx_setup_pci' undeclared here (not in a function)
           183 |  .discover_phbs = mpc83xx_setup_pci,
      	 |                   ^~~~~~~~~~~~~~~~~
      	 |                   mpc83xx_setup_arch
      
      There is a stub defined for the CONFIG_PCI=n case, but now that
      mpc83xx_setup_pci() is being assigned to discover_phbs the correct
      empty value is NULL.
      
      Fixes: 83f84041 ("powerpc/83xx: Move PHB discovery")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210210130804.3190952-1-mpe@ellerman.id.au
      5c47c44f
    • Nicholas Piggin's avatar
      powerpc: remove interrupt handler functions from the noinstr section · e4bb64c7
      Nicholas Piggin authored
      The allyesconfig ppc64 kernel fails to link with relocations unable to
      fit after commit 3a96570f ("powerpc: convert interrupt handlers to
      use wrappers"), which is due to the interrupt handler functions being
      put into the .noinstr.text section, which the linker script places on
      the opposite side of the main .text section from the interrupt entry
      asm code which calls the handlers.
      
      This results in a lot of linker stubs that overwhelm the 252-byte sized
      space we allow for them, or in the case of BE a .opd relocation link
      error for some reason.
      
      It's not required to put interrupt handlers in the .noinstr section,
      previously they used NOKPROBE_SYMBOL, so take them out and replace
      with a NOKPROBE_SYMBOL in the wrapper macro. Remove the explicit
      NOKPROBE_SYMBOL macros in the interrupt handler functions. This makes
      a number of interrupt handlers nokprobe that were not prior to the
      interrupt wrappers commit, but since that commit they were made
      nokprobe due to being in .noinstr.text, so this fix does not change
      that.
      
      The fixes tag is different to the commit that first exposes the problem
      because it is where the wrapper macros were introduced.
      
      Fixes: 8d41fc61 ("powerpc: interrupt handler wrapper functions")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Slightly fix up comment wording]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210211063636.236420-1-npiggin@gmail.com
      e4bb64c7
    • Michael Ellerman's avatar
      powerpc/powernv/pci: Use kzalloc() for phb related allocations · dea6f4c6
      Michael Ellerman authored
      As part of commit fbbefb32 ("powerpc/pci: Move PHB discovery for
      PCI_DN using platforms"), I switched some allocations from
      memblock_alloc() to kmalloc(), otherwise memblock would warn that it
      was being called after slab init.
      
      However I missed that the code relied on the allocations being zeroed,
      without which we could end up crashing:
      
        pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to ff
        BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6af7
        Faulting instruction address: 0xc0000000000dbc90
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
        ...
        NIP  pnv_ioda_get_pe_state+0xe0/0x1d0
        LR   pnv_ioda_get_pe_state+0xb4/0x1d0
        Call Trace:
          pnv_ioda_get_pe_state+0xb4/0x1d0 (unreliable)
          pnv_pci_config_check_eeh.isra.9+0x78/0x270
          pnv_pci_read_config+0xf8/0x160
          pci_bus_read_config_dword+0xa4/0x120
          pci_bus_generic_read_dev_vendor_id+0x54/0x270
          pci_scan_single_device+0xb8/0x140
          pci_scan_slot+0x80/0x1b0
          pci_scan_child_bus_extend+0x94/0x490
          pcibios_scan_phb+0x1f8/0x3c0
          pcibios_init+0x8c/0x12c
          do_one_initcall+0x94/0x510
          kernel_init_freeable+0x35c/0x3fc
          kernel_init+0x2c/0x168
          ret_from_kernel_thread+0x5c/0x70
      
      Switch them to kzalloc().
      
      Fixes: fbbefb32 ("powerpc/pci: Move PHB discovery for PCI_DN using platforms")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210211112749.3410771-1-mpe@ellerman.id.au
      dea6f4c6
  2. 08 Feb, 2021 29 commits