1. 20 Oct, 2018 27 commits
    • Nicholas Piggin's avatar
    • Michael Ellerman's avatar
      selftests/powerpc: Add a test of wild bctr · b7683fc6
      Michael Ellerman authored
      This tests that a bctr (Branch to counter and link), ie. a function
      call, to a wildly out-of-bounds address is handled correctly.
      
      Some old kernel versions didn't handle it correctly, see eg:
      
        "powerpc/slb: Force a full SLB flush when we insert for a bad EA"
        https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-April/157397.htmlSigned-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b7683fc6
    • Michael Ellerman's avatar
      powerpc/mm: Fix page table dump to work on Radix · 0d923962
      Michael Ellerman authored
      When we're running on Book3S with the Radix MMU enabled the page table
      dump currently prints the wrong addresses because it uses the wrong
      start address.
      
      Fix it to use PAGE_OFFSET rather than KERN_VIRT_START.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0d923962
    • Michael Ellerman's avatar
      powerpc/mm/radix: Display if mappings are exec or not · afb6d064
      Michael Ellerman authored
      At boot we print the ranges we've mapped for the linear mapping and
      what page size we've used. Also track whether the range is mapped
      executable or not and display that as well.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      afb6d064
    • Michael Ellerman's avatar
      powerpc/mm/radix: Simplify split mapping logic · 232aa407
      Michael Ellerman authored
      If we look closely at the logic in create_physical_mapping(), when
      we're doing STRICT_KERNEL_RWX, we do the following steps:
        - determine the gap from where we are to the end of the range
        - choose an appropriate mapping_size based on the gap
        - check if that mapping_size would overlap the __init_begin
          boundary, and if not choose an appropriate mapping_size
      
      We can simplify the logic by taking the __init_begin boundary into
      account when we calculate the initial gap.
      
      So add a next_boundary() function which tells us what the next
      boundary is, either the __init_begin boundary or end. In future we can
      add more boundaries.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      232aa407
    • Michael Ellerman's avatar
      powerpc/mm/radix: Remove the retry in the split mapping logic · 57306c66
      Michael Ellerman authored
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel
      text read only.
      
      The current logic uses a goto inside the for loop, which works, but is
      hard to reason about.
      
      When we hit the goto retry case we set max_mapping_size to PMD_SIZE
      and go back to the start.
      
      Setting max_mapping_size means we skip the PUD case and go to the PMD
      case.
      
      We know we will pass the alignment and gap checks because the only
      reason we are there is we hit the goto retry, and that is guarded by
      mapping_size == PUD_SIZE, which means addr is PUD aligned and gap is
      greater or equal to PUD_SIZE.
      
      So the only part of the check that can fail is the mmu_psize_defs
      check for the 2M page size.
      
      If we just duplicate that check we can avoid the goto, and we get the
      same result.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      57306c66
    • Michael Ellerman's avatar
      powerpc/mm/radix: Fix small page at boundary when splitting · 81d1b54d
      Michael Ellerman authored
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel
      text read only.
      
      Currently we always use a small page at the text/data boundary, even
      when that's not necessary:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
      
      This is because the check that the mapping crosses the __init_begin
      boundary is too strict, it also returns true when we map exactly up to
      the boundary.
      
      So fix it to check that the mapping would actually map past
      __init_begin, and with that we see:
      
        Mapped 0x0000000000000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      81d1b54d
    • Michael Ellerman's avatar
      powerpc/mm/radix: Fix overuse of small pages in splitting logic · 3b5657ed
      Michael Ellerman authored
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel text
      read only.
      
      But the current logic uses small pages for the entire text section,
      regardless of whether a larger page size would fit. eg. with the
      boundary at 16M we could use 2M pages, but instead we use 64K pages up
      to the 16M boundary:
      
        Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      This is because the test is checking if addr is < __init_begin
      and addr + mapping_size is >= _stext. But that is true for all pages
      between _stext and __init_begin.
      
      Instead what we want to check is if we are crossing the text/data
      boundary, which is at __init_begin. With that fixed we see:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we're correctly using 2MB pages below __init_begin, but we still
      drop down to 64K pages unnecessarily at the boundary.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3b5657ed
    • Michael Ellerman's avatar
      powerpc/mm/radix: Fix off-by-one in split mapping logic · 5c6499b7
      Michael Ellerman authored
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we try to split the
      kernel linear (1:1) mapping so that the kernel text is in a separate
      page to kernel data, so we can mark the former read-only.
      
      We could achieve that just by always using 64K pages for the linear
      mapping, but we try to be smarter. Instead we use huge pages when
      possible, and only switch to smaller pages when necessary.
      
      However we have an off-by-one bug in that logic, which causes us to
      calculate the wrong boundary between text and data.
      
      For example with the end of the kernel text at 16M we see:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001200000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001200000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we mapped from 0 to 18M with 64K pages, even though the boundary
      between text and data is at 16M.
      
      With the fix we see we're correctly hitting the 16M boundary:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5c6499b7
    • Naveen N. Rao's avatar
      powerpc/ftrace: Handle large kernel configs · 67361cf8
      Naveen N. Rao authored
      Currently, we expect to be able to reach ftrace_caller() from all
      ftrace-enabled functions through a single relative branch. With large
      kernel configs, we see functions outside of 32MB of ftrace_caller()
      causing ftrace_init() to bail.
      
      In such configurations, gcc/ld emits two types of trampolines for mcount():
      1. A long_branch, which has a single branch to mcount() for functions that
         are one hop away from mcount():
      	c0000000019e8544 <00031b56.long_branch._mcount>:
      	c0000000019e8544:	4a 69 3f ac 	b       c00000000007c4f0 <._mcount>
      
      2. A plt_branch, for functions that are farther away from mcount():
      	c0000000051f33f8 <0008ba04.plt_branch._mcount>:
      	c0000000051f33f8:	3d 82 ff a4 	addis   r12,r2,-92
      	c0000000051f33fc:	e9 8c 04 20 	ld      r12,1056(r12)
      	c0000000051f3400:	7d 89 03 a6 	mtctr   r12
      	c0000000051f3404:	4e 80 04 20 	bctr
      
      We can reuse those trampolines for ftrace if we can have those
      trampolines go to ftrace_caller() instead. However, with ABIv2, we
      cannot depend on r2 being valid. As such, we use only the long_branch
      trampolines by patching those to instead branch to ftrace_caller or
      ftrace_regs_caller.
      
      In addition, we add additional trampolines around .text and .init.text
      to catch locations that are covered by the plt branches. This allows
      ftrace to work with most large kernel configurations.
      
      For now, we always patch the trampolines to go to ftrace_regs_caller,
      which is slightly inefficient. This can be optimized further at a later
      point.
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      67361cf8
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fix WARN_ON with THP NUMA migration · dd0e144a
      Aneesh Kumar K.V authored
      WARNING: CPU: 12 PID: 4322 at /arch/powerpc/mm/pgtable-book3s64.c:76 set_pmd_at+0x4c/0x2b0
       Modules linked in:
       CPU: 12 PID: 4322 Comm: qemu-system-ppc Tainted: G        W         4.19.0-rc3-00758-g8f0c636b0542 #36
       NIP:  c0000000000872fc LR: c000000000484eec CTR: 0000000000000000
       REGS: c000003fba876fe0 TRAP: 0700   Tainted: G        W          (4.19.0-rc3-00758-g8f0c636b0542)
       MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24282884  XER: 00000000
       CFAR: c000000000484ee8 IRQMASK: 0
       GPR00: c000000000484eec c000003fba877268 c000000001f0ec00 c000003fbd229f80
       GPR04: 00007c8fe8e00000 c000003f864c5a38 860300853e0000c0 0000000000000080
       GPR08: 0000000080000000 0000000000000001 0401000000000080 0000000000000001
       GPR12: 0000000000002000 c000003fffff5400 c000003fce292000 00007c9024570000
       GPR16: 0000000000000000 0000000000ffffff 0000000000000001 c000000001885950
       GPR20: 0000000000000000 001ffffc0004807c 0000000000000008 c000000001f49d05
       GPR24: 00007c8fe8e00000 c0000000020f2468 ffffffffffffffff c000003fcd33b090
       GPR28: 00007c8fe8e00000 c000003fbd229f80 c000003f864c5a38 860300853e0000c0
       NIP [c0000000000872fc] set_pmd_at+0x4c/0x2b0
       LR [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       Call Trace:
       [c000003fba877268] [c00000000045931c] mpol_misplaced+0x1bc/0x230 (unreliable)
       [c000003fba8772c8] [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       [c000003fba877398] [c00000000040d344] __handle_mm_fault+0x5e4/0x2300
       [c000003fba8774d8] [c00000000040f400] handle_mm_fault+0x3a0/0x420
       [c000003fba877528] [c0000000003ff6f4] __get_user_pages+0x2e4/0x560
       [c000003fba877628] [c000000000400314] get_user_pages_unlocked+0x104/0x2a0
       [c000003fba8776c8] [c000000000118f44] __gfn_to_pfn_memslot+0x284/0x6a0
       [c000003fba877748] [c0000000001463a0] kvmppc_book3s_radix_page_fault+0x360/0x12d0
       [c000003fba877838] [c000000000142228] kvmppc_book3s_hv_page_fault+0x48/0x1300
       [c000003fba877988] [c00000000013dc08] kvmppc_vcpu_run_hv+0x1808/0x1b50
       [c000003fba877af8] [c000000000126b44] kvmppc_vcpu_run+0x34/0x50
       [c000003fba877b18] [c000000000123268] kvm_arch_vcpu_ioctl_run+0x288/0x2d0
       [c000003fba877b98] [c00000000011253c] kvm_vcpu_ioctl+0x1fc/0x8c0
       [c000003fba877d08] [c0000000004e9b24] do_vfs_ioctl+0xa44/0xae0
       [c000003fba877db8] [c0000000004e9c44] ksys_ioctl+0x84/0xf0
       [c000003fba877e08] [c0000000004e9cd8] sys_ioctl+0x28/0x80
      
      We removed the pte_protnone check earlier with the understanding that we
      mark the pte invalid before the set_pte/set_pmd usage. But the huge pmd
      autonuma still use the set_pmd_at directly. This is ok because a protnone pte
      won't have translation cache in TLB.
      
      Fixes: da7ad366 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dd0e144a
    • Michael Ellerman's avatar
      selftests/powerpc: Fix out-of-tree build errors · d8a2fe29
      Michael Ellerman authored
      Some of our Makefiles don't do the right thing when building the
      selftests with O=, fix them up.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d8a2fe29
    • Christophe Leroy's avatar
      powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected · 51eeef9e
      Christophe Leroy authored
      If CONFIG_PPC_SPLPAR is not selected, steal_time will always
      be NUL, so accounting it is pointless
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      51eeef9e
    • Christophe Leroy's avatar
      powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64 · abcff86d
      Christophe Leroy authored
      scaled cputime is only meaningfull when the processor has
      SPURR and/or PURR, which means only on PPC64.
      
      Removing it on PPC32 significantly reduces the size of
      vtime_account_system() and vtime_account_idle() on an 8xx:
      
      Before:
      00000000 l     F .text	000000a8 vtime_delta
      00000280 g     F .text	0000010c vtime_account_system
      0000038c g     F .text	00000048 vtime_account_idle
      
      After:
      (vtime_delta gets inlined inside the two functions)
      000001d8 g     F .text	000000a0 vtime_account_system
      00000278 g     F .text	00000038 vtime_account_idle
      
      In terms of performance, we also get approximatly 7% improvement on
      task switch. The following small benchmark app is run with perf stat:
      
      void *thread(void *arg)
      {
      	int i;
      
      	for (i = 0; i < atoi((char*)arg); i++)
      		pthread_yield();
      }
      
      int main(int argc, char **argv)
      {
      	pthread_t th1, th2;
      
      	pthread_create(&th1, NULL, thread, argv[1]);
      	pthread_create(&th2, NULL, thread, argv[1]);
      	pthread_join(th1, NULL);
      	pthread_join(th2, NULL);
      
      	return 0;
      }
      
      Before the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             8228.476465      task-clock (msec)         #    0.954 CPUs utilized            ( +-  0.23% )
                  200004      context-switches          #    0.024 M/sec                    ( +-  0.00% )
      
      After the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             7649.070444      task-clock (msec)         #    0.955 CPUs utilized            ( +-  0.27% )
                  200004      context-switches          #    0.026 M/sec                    ( +-  0.00% )
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      abcff86d
    • Christophe Leroy's avatar
      powerpc/time: isolate scaled cputime accounting in dedicated functions. · b38a181c
      Christophe Leroy authored
      scaled cputime is only meaningfull when the processor has
      SPURR and/or PURR, which means only on PPC64.
      
      In preparation of the following patch that will remove
      CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC32, this patch moves
      all scaled cputing accounting logic into dedicated functions.
      
      This patch doesn't change any functionality. It's only code
      reorganisation.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b38a181c
    • Christophe Leroy's avatar
      powerpc/kgdb: add kgdb_arch_set/remove_breakpoint() · fb978ca2
      Christophe Leroy authored
      Generic implementation fails to remove breakpoints after init
      when CONFIG_STRICT_KERNEL_RWX is selected:
      
      [   13.251285] KGDB: BP remove failed: c001c338
      [   13.259587] kgdbts: ERROR PUT: end of test buffer on 'do_fork_test' line 8 expected OK got $E14#aa
      [   13.268969] KGDB: re-enter exception: ALL breakpoints killed
      [   13.275099] CPU: 0 PID: 1 Comm: init Not tainted 4.18.0-g82bbb913ffd8 #860
      [   13.282836] Call Trace:
      [   13.285313] [c60e1ba0] [c0080ef0] kgdb_handle_exception+0x6f4/0x720 (unreliable)
      [   13.292618] [c60e1c30] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
      [   13.298709] [c60e1c40] [c000af54] program_check_exception+0x104/0x700
      [   13.305083] [c60e1c60] [c000e45c] ret_from_except_full+0x0/0x4
      [   13.310845] [c60e1d20] [c02a22ac] run_simple_test+0x2b4/0x2d4
      [   13.316532] [c60e1d30] [c0081698] put_packet+0xb8/0x158
      [   13.321694] [c60e1d60] [c00820b4] gdb_serial_stub+0x230/0xc4c
      [   13.327374] [c60e1dc0] [c0080af8] kgdb_handle_exception+0x2fc/0x720
      [   13.333573] [c60e1e50] [c000e928] kgdb_singlestep+0xb4/0xcc
      [   13.339068] [c60e1e70] [c000ae1c] single_step_exception+0x90/0xac
      [   13.345100] [c60e1e80] [c000e45c] ret_from_except_full+0x0/0x4
      [   13.350865] [c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38
      [   13.356346] Kernel panic - not syncing: Recursive entry to debugger
      
      This patch creates powerpc specific version of
      kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint()
      using patch_instruction()
      
      Fixes: 1e0fc9d1 ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fb978ca2
    • Christophe Leroy's avatar
      powerpc/sysdev/ipic: check primary_ipic NULL pointer before using it · 6beb3381
      Christophe Leroy authored
      ipic_get_mcp_status() is used by targets implementing NMI
      watchdog in target specific machine check handler in order
      to known whether a machine check results from a watchdog
      NMI reset.
      
      In case of very early machine check, primary_ipic pointer
      might not have been set yet, so ipic_get_mcp_status() needs
      to check it for nullity before using it.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6beb3381
    • Christophe Leroy's avatar
      powerpc/mm: fix always true/false warning in slice.c · 37e9c674
      Christophe Leroy authored
      This patch fixes the following warnings (obtained with make W=1).
      
      arch/powerpc/mm/slice.c: In function 'slice_range_to_mask':
      arch/powerpc/mm/slice.c:73:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (start < SLICE_LOW_TOP) {
                  ^
      arch/powerpc/mm/slice.c:81:20: error: comparison is always false due to limited range of data type [-Werror=type-limits]
        if ((start + len) > SLICE_LOW_TOP) {
                          ^
      arch/powerpc/mm/slice.c: In function 'slice_mask_for_free':
      arch/powerpc/mm/slice.c:136:17: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (high_limit <= SLICE_LOW_TOP)
                       ^
      arch/powerpc/mm/slice.c: In function 'slice_check_range_fits':
      arch/powerpc/mm/slice.c:185:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (start < SLICE_LOW_TOP) {
                  ^
      arch/powerpc/mm/slice.c:195:39: error: comparison is always false due to limited range of data type [-Werror=type-limits]
        if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
                                             ^
      arch/powerpc/mm/slice.c: In function 'slice_scan_available':
      arch/powerpc/mm/slice.c:306:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (addr < SLICE_LOW_TOP) {
                 ^
      arch/powerpc/mm/slice.c: In function 'get_slice_psize':
      arch/powerpc/mm/slice.c:709:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (addr < SLICE_LOW_TOP) {
                 ^
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      37e9c674
    • Christophe Leroy's avatar
      powerpc/mm: fix missing prototypes in slice.c · aa5456ab
      Christophe Leroy authored
      This patch fixes the following warnings (obtained with make W=1).
      
      arch/powerpc/mm/slice.c: At top level:
      arch/powerpc/mm/slice.c:682:15: error: no previous prototype for 'arch_get_unmapped_area' [-Werror=missing-prototypes]
       unsigned long arch_get_unmapped_area(struct file *filp,
                     ^
      arch/powerpc/mm/slice.c:692:15: error: no previous prototype for 'arch_get_unmapped_area_topdown' [-Werror=missing-prototypes]
       unsigned long arch_get_unmapped_area_topdown(struct file *filp,
                     ^
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      aa5456ab
    • Christophe Leroy's avatar
      powerpc/mm: Trace tlbia instruction · 8114c36e
      Christophe Leroy authored
      Add a trace point for tlbia (Translation Lookaside Buffer Invalidate
      All) instruction.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8114c36e
    • Christophe Leroy's avatar
      powerpc/mm: Add missing tracepoint for tlbie · cf4a6085
      Christophe Leroy authored
      commit 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
      added tracepoints for tlbie calls, but _tlbil_va() was forgotten
      
      Fixes: 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cf4a6085
    • Christophe Leroy's avatar
      powerpc/book3s64: fix dump_linuxpagetables "present" flag · 3ff38e18
      Christophe Leroy authored
      Since commit bd0dbb73 ("powerpc/mm/books3s: Add new pte bit to
      mark pte temporarily invalid."), _PAGE_PRESENT doesn't mean exactly
      that a page is present. A page is also considered preset when
      _PAGE_INVALID is set.
      
      This patch changes the meaning of "present" and adds a status "valid"
      associated to the _PAGE_PRESENT flag.
      
      Fixes: bd0dbb73 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3ff38e18
    • Aravinda Prasad's avatar
      powerpc/pseries: Export raw per-CPU VPA data via debugfs · c6c26fb5
      Aravinda Prasad authored
      This patch exports the raw per-CPU VPA data via debugfs.
      A per-CPU file is created which exports the VPA data of
      that CPU to help debug some of the VPA related issues or
      to analyze the per-CPU VPA related statistics.
      
      v3: Removed offline CPU check.
      
      v2: Included offline CPU check and other review comments.
      Signed-off-by: default avatarAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c6c26fb5
    • Naveen N. Rao's avatar
      selftests/powerpc: Add test to verify rfi flush across a system call · d2bf7932
      Naveen N. Rao authored
      This adds a test to verify proper functioning of the rfi flush
      capability implemented to mitigate meltdown. The test works by
      measuring the number of L1d cache misses encountered while loading
      data from memory. Across a system call, since the L1d cache is flushed
      when rfi_flush is enabled, the number of cache misses is expected to
      be relative to the number of cachelines corresponding to the data
      being loaded.
      
      The current system setting is reflected via powerpc/rfi_flush under
      debugfs (assumed to be /sys/kernel/debug/). This test verifies the
      expected result with rfi_flush enabled as well as when it is disabled.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Add SPDX tags, clang format, skip if the debugfs is missing, use
       __u64 and SANE_USERSPACE_TYPES to avoid printf() build errors.]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d2bf7932
    • Naveen N. Rao's avatar
      selftests/powerpc: Move UCONTEXT_NIA() into utils.h · db384851
      Naveen N. Rao authored
      ... so that it can be used by others.
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      db384851
    • Naveen N. Rao's avatar
      powerpc64/module elfv1: Set opd addresses after module relocation · 59fe7eaf
      Naveen N. Rao authored
      module_frob_arch_sections() is called before the module is moved to its
      final location. The function descriptor section addresses we are setting
      here are thus invalid. Fix this by processing opd section during
      module_finalize()
      
      Fixes: 5633e85b ("powerpc64: Add .opd based function descriptor dereference")
      Cc: stable@vger.kernel.org # v4.16
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      59fe7eaf
    • Naveen N. Rao's avatar
      powerpc: Add support for function error injection · 7cd01b08
      Naveen N. Rao authored
      We implement regs_set_return_value() and override_function_with_return()
      for this purpose.
      
      On powerpc, a return from a function (blr) just branches to the location
      contained in the link register. So, we can just update pt_regs rather
      than redirecting execution to a dummy function that returns.
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7cd01b08
  2. 19 Oct, 2018 3 commits
    • Michael Ellerman's avatar
      powerpc/time: Fix clockevent_decrementer initalisation for PR KVM · b4d16ab5
      Michael Ellerman authored
      In the recent commit 8b78fdb0 ("powerpc/time: Use
      clockevents_register_device(), fixing an issue with large
      decrementer") we changed the way we initialise the decrementer
      clockevent(s).
      
      We no longer initialise the mult & shift values of
      decrementer_clockevent itself.
      
      This has the effect of breaking PR KVM, because it uses those values
      in kvmppc_emulate_dec(). The symptom is guest kernels spin forever
      mid-way through boot.
      
      For now fix it by assigning back to decrementer_clockevent the mult
      and shift values.
      
      Fixes: 8b78fdb0 ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b4d16ab5
    • Michael Ellerman's avatar
      powerpc/aout: Fix struct user definition to use user_pt_regs · 6ce7bff0
      Michael Ellerman authored
      I'm pretty sure this is dead code, it's only used by the a.out core
      dump code, and we don't support a.out. We should remove it.
      
      But while it's in the tree it should be using the ABI version of
      pt_regs which is called user_pt_regs in the kernel, because the whole
      struct is written to the core dump and so its size shouldn't change.
      
      Note this isn't a uapi header so we don't need an ifdef.
      
      Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6ce7bff0
    • Michael Ellerman's avatar
      powerpc/uapi: Fix sigcontext definition to use user_pt_regs · 22a3d03d
      Michael Ellerman authored
      My recent patch to split pt_regs between user and kernel missed
      the usage in struct sigcontext.
      
      Because this is a user visible struct it should be using the user
      visible definition, which when we're building for the kernel is called
      struct user_pt_regs.
      
      As far as I can see this hasn't actually caused a bug (yet), because
      we don't use the sizeof() the sigcontext->regs anywhere. But we should
      still fix it to avoid confusion and future bugs.
      
      Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
      Reported-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      22a3d03d
  3. 18 Oct, 2018 10 commits