1. 12 Apr, 2017 5 commits
    • Oliver O'Halloran's avatar
      powerpc/mm: Add physical address to Linux page table dump · aaa22952
      Oliver O'Halloran authored
      The current page table dumper scans the Linux page tables and coalesces mappings
      with adjacent virtual addresses and similar PTE flags. This behaviour is
      somewhat broken when you consider the IOREMAP space where entirely unrelated
      mappings will appear to be virtually contiguous. This patch modifies the range
      coalescing so that only ranges that are both physically and virtually contiguous
      are combined. This patch also adds to the dump output the physical address at
      the start of each range.
      
      Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      [mpe: Print the physicall address with 0x like the other addresses]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      aaa22952
    • Oliver O'Halloran's avatar
      powerpc/mm: Fix missing _PAGE_NON_IDEMPOTENT in pgtable dump · 70538eaa
      Oliver O'Halloran authored
      On Book3s we have two PTE flags used to mark cache-inhibited mappings:
      _PAGE_TOLERANT and _PAGE_NON_IDEMPOTENT. Currently the kernel page table dumper
      only looks at the generic _PAGE_NO_CACHE which is defined to be _PAGE_TOLERANT.
      This patch modifies the dumper so both flags are shown in the dump.
      
      Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      70538eaa
    • Balbir Singh's avatar
      powerpc/tracing: Allow tracing of mmap syscalls · 9c355917
      Balbir Singh authored
      Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
      syscall tracing machinery. This means users are not able to see the execution of
      mmap() syscalls using the syscall tracer.
      
      Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
      meta-data associated with these syscalls is visible to the syscall tracer.
      
      A side-effect of this change is that the return type has changed from unsigned
      long to long. However this should have no effect, the only code in the kernel
      which uses the result of these syscalls is in the syscall return path, which is
      written in asm and treats the result as unsigned regardless.
      
      Example output:
        cat-3399  [001] ....   196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
        cat-3399  [001] ....   196.542443: sys_mmap -> 0x7fff922a0000
        cat-3399  [001] ....   196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
        cat-3399  [001] ....   196.542677: sys_munmap -> 0x0
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      [mpe: Massage change log, add detail on return type change]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9c355917
    • Michael Ellerman's avatar
      powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages · 03dfee6d
      Michael Ellerman authored
      Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB"),
      we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
      makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
      calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
      
      The end result is that the PGD (Page Global Directory, ie top level page table)
      of the kernel (aka. swapper_pg_dir), is too small.
      
      This generally doesn't lead to a crash, as we don't use the full range in normal
      operation. However if we try to dump the kernel pagetables we can trigger a
      crash because we walk off the end of the pgd into other memory and eventually
      try to dereference something bogus:
      
        $ cat /sys/kernel/debug/kernel_pagetables
        Unable to handle kernel paging request for data at address 0xe8fece0000000000
        Faulting instruction address: 0xc000000000072314
        cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
            pc: c000000000072314: ptdump_show+0x164/0x430
            lr: c000000000072550: ptdump_show+0x3a0/0x430
           dar: e802cf0000000000
        seq_read+0xf8/0x560
        full_proxy_read+0x84/0xc0
        __vfs_read+0x6c/0x1d0
        vfs_read+0xbc/0x1b0
        SyS_read+0x6c/0x110
        system_call+0x38/0xfc
      
      The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
      the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
      the calculation into asm-offsets.c where we can do it easily using
      max().
      
      Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      03dfee6d
    • Michael Ellerman's avatar
      Merge branch 'topic/xive' (early part) into next · 3c19d5ad
      Michael Ellerman authored
      This merges the arch part of the XIVE support, leaving the final commit
      with the KVM specific pieces dangling on the branch for Paul to merge
      via the kvm-ppc tree.
      3c19d5ad
  2. 10 Apr, 2017 19 commits
  3. 07 Apr, 2017 1 commit
    • Benjamin Herrenschmidt's avatar
      powerpc/smp: Remove migrate_irq() custom implementation · a978e139
      Benjamin Herrenschmidt authored
      Some powerpc platforms use this to move IRQs away from a CPU being
      unplugged. This function has several bugs such as not taking the right
      locks or failing to NULL check pointers.
      
      There's a new generic function doing exactly the same thing without all
      the bugs, so let's use it instead.
      
      mpe: The obvious place for the select of GENERIC_IRQ_MIGRATION is on
      HOTPLUG_CPU, but that doesn't work. On some configs PM_SLEEP_SMP will
      select HOTPLUG_CPU even though its dependencies are not met, which means
      the select of GENERIC_IRQ_MIGRATION doesn't happen. That leads to the
      build breaking. Fix it by moving the select of GENERIC_IRQ_MIGRATION to
      SMP.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a978e139
  4. 06 Apr, 2017 3 commits
  5. 04 Apr, 2017 3 commits
    • Matt Brown's avatar
      powerpc/powernv: Add OPAL exports attributes to sysfs · 11fe909d
      Matt Brown authored
      New versions of OPAL have a device node /ibm,opal/firmware/exports, each
      property of which describes a range of memory in OPAL that Linux might
      want to export to userspace for debugging.
      
      This patch adds a sysfs file under 'opal/exports' for each property
      found there, and makes it read-only by root.
      Signed-off-by: default avatarMatt Brown <matthew.brown.dev@gmail.com>
      [mpe: Drop counting of props, rename to attr, free on sysfs error, c'log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      11fe909d
    • Sukadev Bhattiprolu's avatar
      powerpc/prom: Increase minimum RMA size to 512MB · 687da8fc
      Sukadev Bhattiprolu authored
      When booting very large systems with a large initrd, we run out of
      space early in boot for either RTAS or the flattened device tree (FDT).
      Boot fails with messages like:
      
      	Could not allocate memory for RTAS
      or
      	No memory for flatten_device_tree (no room)
      
      Increasing the minimum RMA size to 512MB fixes the problem. This
      should not have an impact on smaller LPARs (with 256MB memory),
      as the firmware will cap the RMA to the memory assigned to the LPAR.
      
      Fix is based on input/discussions with Michael Ellerman. Thanks to
      Praveen K. Pandey for testing on a large system.
      Reported-by: default avatarPraveen K. Pandey <preveen.pandey@in.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      687da8fc
    • Alistair Popple's avatar
      powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f
      Alistair Popple authored
      Nvlink2 supports address translation services (ATS) allowing devices
      to request address translations from an mmu known as the nest MMU
      which is setup to walk the CPU page tables.
      
      To access this functionality certain firmware calls are required to
      setup and manage hardware context tables in the nvlink processing unit
      (NPU). The NPU also manages forwarding of TLB invalidates (known as
      address translation shootdowns/ATSDs) to attached devices.
      
      This patch exports several methods to allow device drivers to register
      a process id (PASID/PID) in the hardware tables and to receive
      notification of when a device should stop issuing address translation
      requests (ATRs). It also adds a fault handler to allow device drivers
      to demand fault pages in.
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      [mpe: Fix up comment formatting, use flush_tlb_mm()]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1ab66d1f
  6. 03 Apr, 2017 6 commits
  7. 01 Apr, 2017 3 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Enable mappings above 128TB · f4ea6dcb
      Aneesh Kumar K.V authored
      Not all user space application is ready to handle wide addresses. It's
      known that at least some JIT compilers use higher bits in pointers to
      encode their information. It collides with valid pointers with 512TB
      addresses and leads to crashes.
      
      To mitigate this, we are not going to allocate virtual address space
      above 128TB by default.
      
      But userspace can ask for allocation from full address space by
      specifying hint address (with or without MAP_FIXED) above 128TB.
      
      If hint address set above 128TB, but MAP_FIXED is not specified, we try
      to look for unmapped area by specified address. If it's already
      occupied, we look for unmapped area in *full* address space, rather than
      from 128TB window.
      
      This approach helps to easily make application's memory allocator aware
      about large address space without manually tracking allocated virtual
      address space.
      
      This is going to be a per mmap decision. ie, we can have some mmaps with
      larger addresses and other that do not.
      
      A sample memory layout looks like:
      
        10000000-10010000 r-xp 00000000 fc:00 9057045          /home/max_addr_512TB
        10010000-10020000 r--p 00000000 fc:00 9057045          /home/max_addr_512TB
        10020000-10030000 rw-p 00010000 fc:00 9057045          /home/max_addr_512TB
        10029630000-10029660000 rw-p 00000000 00:00 0          [heap]
        7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0
        7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83690000-7fff836a0000 rw-p 00000000 00:00 0
        7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0        [vdso]
        7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0        [stack]
        1000000000000-1000000010000 rw-p 00000000 00:00 0
        1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f4ea6dcb
    • Aneesh Kumar K.V's avatar
    • Aneesh Kumar K.V's avatar
      powerpc/pseries: Skip using reserved virtual address range · 82228e36
      Aneesh Kumar K.V authored
      Now that we use all the available virtual address range, we need to make
      sure we don't generate VSID such that it overlaps with the reserved vsid
      range. Reserved vsid range include the virtual address range used by the
      adjunct partition and also the VRMA virtual segment. We find the context
      value that can result in generating such a VSID and reserve it early in
      boot.
      
      We don't look at the adjunct range, because for now we disable the
      adjunct usage in a Linux LPAR via CAS interface.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Rewrite hash__reserve_context_id(), move the rest into pseries]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      82228e36