1. 20 Oct, 2018 6 commits
  2. 19 Oct, 2018 3 commits
    • Michael Ellerman's avatar
      powerpc/time: Fix clockevent_decrementer initalisation for PR KVM · b4d16ab5
      Michael Ellerman authored
      In the recent commit 8b78fdb0 ("powerpc/time: Use
      clockevents_register_device(), fixing an issue with large
      decrementer") we changed the way we initialise the decrementer
      clockevent(s).
      
      We no longer initialise the mult & shift values of
      decrementer_clockevent itself.
      
      This has the effect of breaking PR KVM, because it uses those values
      in kvmppc_emulate_dec(). The symptom is guest kernels spin forever
      mid-way through boot.
      
      For now fix it by assigning back to decrementer_clockevent the mult
      and shift values.
      
      Fixes: 8b78fdb0 ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b4d16ab5
    • Michael Ellerman's avatar
      powerpc/aout: Fix struct user definition to use user_pt_regs · 6ce7bff0
      Michael Ellerman authored
      I'm pretty sure this is dead code, it's only used by the a.out core
      dump code, and we don't support a.out. We should remove it.
      
      But while it's in the tree it should be using the ABI version of
      pt_regs which is called user_pt_regs in the kernel, because the whole
      struct is written to the core dump and so its size shouldn't change.
      
      Note this isn't a uapi header so we don't need an ifdef.
      
      Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6ce7bff0
    • Michael Ellerman's avatar
      powerpc/uapi: Fix sigcontext definition to use user_pt_regs · 22a3d03d
      Michael Ellerman authored
      My recent patch to split pt_regs between user and kernel missed
      the usage in struct sigcontext.
      
      Because this is a user visible struct it should be using the user
      visible definition, which when we're building for the kernel is called
      struct user_pt_regs.
      
      As far as I can see this hasn't actually caused a bug (yet), because
      we don't use the sizeof() the sigcontext->regs anywhere. But we should
      still fix it to avoid confusion and future bugs.
      
      Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
      Reported-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      22a3d03d
  3. 18 Oct, 2018 19 commits
  4. 14 Oct, 2018 12 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Increase the max addressable memory to 2PB · 4ffe713b
      Aneesh Kumar K.V authored
      Currently we limit the max addressable memory to 128TB. This patch increase the
      limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
      limit.
      
      We still don't support regular system ram above 512TB. One of the challenge with
      that is the percpu allocator, that allocates per node memory and use the max
      distance between them as the percpu offsets. This means with large gap in
      address space ( system ram above 1PB) we will run out of vmalloc space to map
      the percpu allocation.
      
      In order to support addressable memory above 512TB, kernel should be able to
      linear map this range. To do that with hash translation we now add 4 context
      to kernel linear map region. Our per context addressable range is 512TB. We
      still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
      to validate these limit.
      
      We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4ffe713b
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Rename get_ea_context to get_user_context · c9f80734
      Aneesh Kumar K.V authored
      We will be adding get_kernel_context later. Update function name to indicate
      this handle context allocation user space address.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c9f80734
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Add some SLB debugging tests · e15a4fea
      Nicholas Piggin authored
      This adds CONFIG_DEBUG_VM checks to ensure:
        - The kernel stack is in the SLB after it's flushed and bolted.
        - We don't insert an SLB for an address that is aleady in the SLB.
        - The kernel SLB miss handler does not take an SLB miss.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e15a4fea
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Simplify slb_flush_and_rebolt() · 94ee4272
      Nicholas Piggin authored
      slb_flush_and_rebolt() is misleading, it is called in virtual mode, so
      it can not possibly change the stack, so it should not be touching the
      shadow area. And since vmalloc is no longer bolted, it should not
      change any bolted mappings at all.
      
      Change the name to slb_flush_and_restore_bolted(), and have it just
      load the kernel stack from what's currently in the shadow SLB area.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      94ee4272
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Add a SLB preload cache · 5434ae74
      Nicholas Piggin authored
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5434ae74
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setup · 425d3314
      Nicholas Piggin authored
      This will be used by the SLB code in the next patch, but for now this
      sets the slb_addr_limit to the correct size for 32-bit tasks.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      425d3314
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Add SLB allocation status bitmaps · 126b11b2
      Nicholas Piggin authored
      Add 32-entry bitmaps to track the allocation status of the first 32
      SLB entries, and whether they are user or kernel entries. These are
      used to allocate free SLB entries first, before resorting to the round
      robin allocator.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      126b11b2
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Convert SLB miss handlers to C · 48e7b769
      Nicholas Piggin authored
      This patch moves SLB miss handlers completely to C, using the standard
      exception handler macros to set up the stack and branch to C.
      
      This can be done because the segment containing the kernel stack is
      always bolted, so accessing it with relocation on will not cause an
      SLB exception.
      
      Arbitrary kernel memory must not be accessed when handling kernel
      space SLB misses, so care should be taken there. However user SLB
      misses can access any kernel memory, which can be used to move some
      fields out of the paca (in later patches).
      
      User SLB misses could quite easily reconcile IRQs and set up a first
      class kernel environment and exit via ret_from_except, however that
      doesn't seem to be necessary at the moment, so we only do that if a
      bad fault is encountered.
      
      [ Credit to Aneesh for bug fixes, error checks, and improvements to
        bad address handling, etc ]
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Disallow tracing for all of slb.c for now.]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      48e7b769
    • Nicholas Piggin's avatar
      powerpc/64: Interrupts save PPR on stack rather than thread_struct · 4c2de74c
      Nicholas Piggin authored
      PPR is the odd register out when it comes to interrupt handling, it is
      saved in current->thread.ppr while all others are saved on the stack.
      
      The difficulty with this is that accessing thread.ppr can cause a SLB
      fault, but the SLB fault handler implementation in C change had
      assumed the normal exception entry handlers would not cause an SLB
      fault.
      
      Fix this by allocating room in the interrupt stack to save PPR.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4c2de74c
    • Michael Ellerman's avatar
      powerpc/ptrace: Don't use sizeof(struct pt_regs) in ptrace code · 3eeacd9f
      Michael Ellerman authored
      Now that we've split the user & kernel versions of pt_regs we need to
      be more careful in the ptrace code.
      
      For now we've ensured the location of the fields in both structs is
      the same, so most of the ptrace code doesn't need updating.
      
      But there are a few places where we use sizeof(pt_regs), and these
      will be wrong as soon as we increase the size of the kernel structure.
      
      So flip them all to use sizeof(user_pt_regs).
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3eeacd9f
    • Michael Ellerman's avatar
      powerpc: Split user/kernel definitions of struct pt_regs · 002af939
      Michael Ellerman authored
      We use a shared definition for struct pt_regs in uapi/asm/ptrace.h.
      That means the layout of the structure is ABI, ie. we can't change it.
      
      That would be fine if it was only used to describe the user-visible
      register state of a process, but it's also the struct we use in the
      kernel to describe the registers saved in an interrupt frame.
      
      We'd like more flexibility in the content (and possibly layout) of the
      kernel version of the struct, but currently that's not possible.
      
      So split the definition into a user-visible definition which remains
      unchanged, and a kernel internal one.
      
      At the moment they're still identical, and we check that at build
      time. That's because we have code (in ptrace etc.) that assumes that
      they are the same. We will fix that code in future patches, and then
      we can break the strict symmetry between the two structs.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      002af939
    • Benjamin Herrenschmidt's avatar
      7f995d3b