1. 11 Nov, 2017 1 commit
  2. 10 Nov, 2017 8 commits
    • Nicholas Piggin's avatar
      powerpc/64: Set DSCR default initially from SPR · 1696d0fb
      Nicholas Piggin authored
      Take the DSCR value set by firmware as the dscr_default value,
      rather than zero.
      
      POWER9 recommends DSCR default to a non-zero value.
      Signed-off-by: default avatarFrom: Nicholas Piggin <npiggin@gmail.com>
      [mpe: Make record_spr_defaults() __init]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1696d0fb
    • Nicholas Piggin's avatar
      powerpc/powernv: Avoid waiting for secondary hold spinloop with OPAL · 339a3293
      Nicholas Piggin authored
      OPAL boot does not insert secondaries at 0x60 to wait at the secondary
      hold spinloop. Instead they are started later, and inserted at
      generic_secondary_smp_init(), which is after the secondary hold
      spinloop.
      
      Avoid waiting on this spinloop when booting with OPAL firmware. This
      wait always times out that case.
      
      This saves 100ms boot time on powernv, and 10s of seconds of real time
      when booting on the simulator in SMP.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      339a3293
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Improve TLB flushing for page table freeing · 0b2f5a8a
      Nicholas Piggin authored
      Unmaps that free page tables always flush the entire PID, which is
      sub-optimal. Provide TLB range flushing with an additional PWC flush
      that can be use for va range invalidations with PWC flush.
      
           Time to munmap N pages of memory including last level page table
           teardown (after mmap, touch), local invalidate:
           N           1       2      4      8     16     32     64
           vanilla  3.2us  3.3us  3.4us  3.6us  4.1us  5.2us  7.2us
           patched  1.4us  1.5us  1.7us  1.9us  2.6us  3.7us  6.2us
      
           Global invalidate:
           N           1       2      4      8     16      32     64
           vanilla  2.2us  2.3us  2.4us  2.6us  3.2us   4.1us  6.2us
           patched  2.1us  2.5us  3.4us  5.2us  8.7us  15.7us  6.2us
      
      Local invalidates get much better across the board. Global ones have
      the same issue where multiple tlbies for va flush do get slower than
      the single tlbie to invalidate the PID. None of this test captures
      the TLB benefits of avoiding killing everything.
      
      Global gets worse, but it is brought in to line with global invalidate
      for munmap()s that do not free page tables.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0b2f5a8a
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Introduce local single page ceiling for TLB range flush · f6f27951
      Nicholas Piggin authored
      The single page flush ceiling is the cut-off point at which we switch
      from invalidating individual pages, to invalidating the entire process
      address space in response to a range flush.
      
      Introduce a local variant of this heuristic because local and global
      tlbie have significantly different properties:
      - Local tlbiel requires 128 instructions to invalidate a PID, global
        tlbie only 1 instruction.
      - Global tlbie instructions are expensive broadcast operations.
      
      The local ceiling has been made much higher, 2x the number of
      instructions required to invalidate the entire PID (i.e., 256 pages).
      
           Time to mprotect N pages of memory (after mmap, touch), local invalidate:
           N           32     34      64     128     256     512
           vanilla  7.4us  9.0us  14.6us  26.4us  50.2us  98.3us
           patched  7.4us  7.8us  13.8us  26.4us  51.9us  98.3us
      
      The behaviour of both is identical at N=32 and N=512. Between there,
      the vanilla kernel does a PID invalidate and the patched kernel does
      a va range invalidate.
      
      At N=128, these require the same number of tlbiel instructions, so
      the patched version can be sen to be cheaper when < 128, and more
      expensive when > 128. However this does not well capture the cost
      of invalidated TLB.
      
      The additional cost at 256 pages does not seem prohibitive. It may
      be the case that increasing the limit further would continue to be
      beneficial to avoid invalidating all of the process's TLB entries.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f6f27951
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Optimize flush_tlb_range · cbf09c83
      Nicholas Piggin authored
      Currently for radix, flush_tlb_range flushes the entire PID, because
      the Linux mm code does not tell us about page size here for THP vs
      regular pages. This is quite sub-optimal for small mremap / mprotect
      / change_protection.
      
      So implement va range flushes with two flush passes, one for each
      page size (regular and THP). The second flush has an order of matnitude
      fewer tlbie instructions than the first, so it is a relatively small
      additional cost.
      
      There is still room for improvement here with some changes to generic
      APIs, particularly if there are mostly THP pages to be invalidated,
      the small page flushes could be reduced.
      
      Time to mprotect 1 page of memory (after mmap, touch):
      vanilla 2.9us   1.8us
      patched 1.2us   1.6us
      
      Time to mprotect 30 pages of memory (after mmap, touch):
      vanilla 8.2us   7.2us
      patched 6.9us   17.9us
      
      Time to mprotect 34 pages of memory (after mmap, touch):
      vanilla 9.1us   8.0us
      patched 9.0us   8.0us
      
      34 pages is the point at which the invalidation switches from va
      to entire PID, which tlbie can do in a single instruction. This is
      why in the case of 30 pages, the new code runs slower for this test.
      This is a deliberate tradeoff already present in the unmap and THP
      promotion code, the idea is that the benefit from avoiding flushing
      entire TLB for this PID on all threads in the system.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cbf09c83
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions · d665767e
      Nicholas Piggin authored
      Move the barriers and range iteration down into the _tlbie* level,
      which improves readability.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d665767e
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Optimize TLB range flush barriers · 14001c60
      Nicholas Piggin authored
      Short range flushes issue a sequences of tlbie(l) instructions for
      individual effective addresses. These do not all require individual
      barrier sequences, only one covering all tlbie(l) instructions.
      
      Commit f7327e0b ("powerpc/mm/radix: Remove unnecessary ptesync")
      made a similar optimization for tlbiel for PID flushing.
      
      For tlbie, the ISA says:
      
          The tlbsync instruction provides an ordering function for the
          effects of all tlbie instructions executed by the thread executing
          the tlbsync instruction, with respect to the memory barrier
          created by a subsequent ptesync instruction executed by the same
          thread.
      
      Time to munmap 30 pages of memory (after mmap, touch):
               local   global
      vanilla  10.9us  22.3us
      patched   3.4us  14.4us
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      14001c60
    • Michael Ellerman's avatar
      Merge branch 'fixes' into next · a54c61f4
      Michael Ellerman authored
      We have some dependencies & conflicts between patches in fixes and
      things to go in next, both in the radix TLB flush code and the IMC PMU
      driver. So merge fixes into next.
      a54c61f4
  3. 09 Nov, 2017 1 commit
  4. 08 Nov, 2017 1 commit
    • Balbir Singh's avatar
      powerpc/xmon: Support dumping software pagetables · 80eff6c4
      Balbir Singh authored
      It would be nice to be able to dump page tables in a particular
      context.
      
      eg: dumping vmalloc space:
      
        0:mon> dv 0xd00037fffff00000
        pgd  @ 0xc0000000017c0000
        pgdp @ 0xc0000000017c00d8 = 0x00000000f10b1000
        pudp @ 0xc0000000f10b13f8 = 0x00000000f10d0000
        pmdp @ 0xc0000000f10d1ff8 = 0x00000000f1102000
        ptep @ 0xc0000000f1102780 = 0xc0000000f1ba018e
        Maps physical address = 0x00000000f1ba0000
        Flags = Accessed Dirty Read Write
      
      This patch does not replicate the complex code of dump_pagetable and
      has no support for bolted linear mapping, thats why I've it's called
      dump virtual page table support. The format of the PTE can be expanded
      even further to add more useful information about the flags in the PTE
      if required.
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      [mpe: Bike shed the output format, show the pgdir, fix build failures]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      80eff6c4
  5. 07 Nov, 2017 3 commits
  6. 06 Nov, 2017 26 commits
    • Nicholas Piggin's avatar
      powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1 · e3646330
      Nicholas Piggin authored
      DD2.1 does not have to save MMCR0 for all state-loss idle states,
      only after deep idle states (like other PMU registers).
      Reviewed-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e3646330
    • Nicholas Piggin's avatar
      powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1 · 9d2f510a
      Nicholas Piggin authored
      DD2.1 does not have to flush the ERAT after a state-loss idle.
      
      Performance testing was done on a DD2.1 using only the stop0 idle state
      (the shallowest state which supports state loss), using context_switch
      selftest configured to ping-poing between two threads on the same core
      and two different cores.
      
      Performance improvement for same core is 7.0%, different cores is 14.8%.
      Reviewed-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9d2f510a
    • Nicholas Piggin's avatar
      powerpc: add POWER9_DD20 feature · b6b3755e
      Nicholas Piggin authored
      Cc: Michael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b6b3755e
    • Cyril Bur's avatar
      powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable · 6f700d38
      Cyril Bur authored
      After handling a transactional FP, Altivec or VSX unavailable exception.
      The return to userspace code will detect that the TIF_RESTORE_TM bit is
      set and call restore_tm_state(). restore_tm_state() will call
      restore_math() to ensure that the correct facilities are loaded.
      
      This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm()
      is doing pointless work and can simply be removed.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6f700d38
    • Cyril Bur's avatar
      powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint · eb5c3f1c
      Cyril Bur authored
      Lazy save and restore of FP/Altivec means that a userspace process can
      be sent to userspace with FP or Altivec disabled and loaded only as
      required (by way of an FP/Altivec unavailable exception). Transactional
      Memory complicates this situation as a transaction could be started
      without FP/Altivec being loaded up. This causes the hardware to
      checkpoint incorrect registers. Handling FP/Altivec unavailable
      exceptions while a thread is transactional requires a reclaim and
      recheckpoint to ensure the CPU has correct state for both sets of
      registers.
      
      tm_reclaim() has optimisations to not always save the FP/Altivec
      registers to the checkpointed save area. This was originally done
      because the caller might have information that the checkpointed
      registers aren't valid due to lazy save and restore. We've also been a
      little vague as to how tm_reclaim() leaves the FP/Altivec state since it
      doesn't necessarily always save it to the thread struct. This has lead
      to an (incorrect) assumption that it leaves the checkpointed state on
      the CPU.
      
      tm_recheckpoint() has similar optimisations in reverse. It may not
      always reload the checkpointed FP/Altivec registers from the thread
      struct before the trecheckpoint. It is therefore quite unclear where it
      expects to get the state from. This didn't help with the assumption
      made about tm_reclaim().
      
      These optimisations sit in what is by definition a slow path. If a
      process has to go through a reclaim/recheckpoint then its transaction
      will be doomed on returning to userspace. This mean that the process
      will be unable to complete its transaction and be forced to its failure
      handler. This is already an out if line case for userspace. Furthermore,
      the cost of copying 64 times 128 bits from registers isn't very long[0]
      (at all) on modern processors. As such it appears these optimisations
      have only served to increase code complexity and are unlikely to have
      had a measurable performance impact.
      
      Our transactional memory handling has been riddled with bugs. A cause
      of this has been difficulty in following the code flow, code complexity
      has not been our friend here. It makes sense to remove these
      optimisations in favour of a (hopefully) more stable implementation.
      
      This patch does mean that some times the assembly will needlessly save
      'junk' registers which will subsequently get overwritten with the
      correct value by the C code which calls the assembly function. This
      small inefficiency is far outweighed by the reduction in complexity for
      general TM code, context switching paths, and transactional facility
      unavailable exception handler.
      
      0: I tried to measure it once for other work and found that it was
      hiding in the noise of everything else I was working with. I find it
      exceedingly likely this will be the case here.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      eb5c3f1c
    • Cyril Bur's avatar
      powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception · 91381b9c
      Cyril Bur authored
      Lazy save and restore of FP/Altivec means that a userspace process can
      be sent to userspace with FP or Altivec disabled and loaded only as
      required (by way of an FP/Altivec unavailable exception). Transactional
      Memory complicates this situation as a transaction could be started
      without FP/Altivec being loaded up. This causes the hardware to
      checkpoint incorrect registers. Handling FP/Altivec unavailable
      exceptions while a thread is transactional requires a reclaim and
      recheckpoint to ensure the CPU has correct state for both sets of
      registers.
      
      tm_reclaim() has optimisations to not always save the FP/Altivec
      registers to the checkpointed save area. This was originally done
      because the caller might have information that the checkpointed
      registers aren't valid due to lazy save and restore. We've also been a
      little vague as to how tm_reclaim() leaves the FP/Altivec state since it
      doesn't necessarily always save it to the thread struct. This has lead
      to an (incorrect) assumption that it leaves the checkpointed state on
      the CPU.
      
      tm_recheckpoint() has similar optimisations in reverse. It may not
      always reload the checkpointed FP/Altivec registers from the thread
      struct before the trecheckpoint. It is therefore quite unclear where it
      expects to get the state from. This didn't help with the assumption
      made about tm_reclaim().
      
      This patch is a minimal fix for ease of backporting. A more correct fix
      which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
      altogether has been upstreamed to apply on top of this patch.
      
      Fixes: dc310669 ("powerpc: tm: Always use fp_state and vr_state to
      store live registers")
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      91381b9c
    • Cyril Bur's avatar
      powerpc: Don't enable FP/Altivec if not checkpointed · a7771176
      Cyril Bur authored
      Lazy save and restore of FP/Altivec means that a userspace process can
      be sent to userspace with FP or Altivec disabled and loaded only as
      required (by way of an FP/Altivec unavailable exception). Transactional
      Memory complicates this situation as a transaction could be started
      without FP/Altivec being loaded up. This causes the hardware to
      checkpoint incorrect registers. Handling FP/Altivec unavailable
      exceptions while a thread is transactional requires a reclaim and
      recheckpoint to ensure the CPU has correct state for both sets of
      registers.
      
      Lazy save and restore of FP/Altivec cannot be done if a process is
      transactional. If a facility was enabled it must remain enabled whenever
      a thread is transactional.
      
      Commit dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware
      transactional memory in use") ensures that the facilities are always
      enabled if a thread is transactional. A bug in the introduced code may
      cause it to inadvertently enable a facility that was (and should remain)
      disabled. The problem with this extraneous enablement is that the
      registers for the erroneously enabled facility have not been correctly
      recheckpointed - the recheckpointing code assumed the facility would
      remain disabled.
      
      Further compounding the issue, the transactional {fp,altivec,vsx}
      unavailable code has been incorrectly using the MSR to enable
      facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
      means if the registers are live on the CPU, not if the kernel should
      load them before returning to userspace. This has worked due to the bug
      mentioned above.
      
      This causes transactional threads which return to their failure handler
      to observe incorrect checkpointed registers. Perhaps an example will
      help illustrate the problem:
      
      A userspace process is running and uses both FP and Altivec registers.
      This process then continues to run for some time without touching
      either sets of registers. The kernel subsequently disables the
      facilities as part of lazy save and restore. The userspace process then
      performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
      registers. The process then performs a floating point instruction
      triggering a fp unavailable exception in the kernel.
      
      The kernel then loads the FP registers - and only the FP registers.
      Since the thread is transactional it must perform a reclaim and
      recheckpoint to ensure both the checkpointed registers and the
      transactional registers are correct. It then (correctly) enables
      MSR[FP] for the process. Later (on exception exist) the kernel also
      (inadvertently) enables MSR[VEC]. The process is then returned to
      userspace.
      
      Since the act of loading the FP registers doomed the transaction we know
      CPU will fail the transaction, restore its checkpointed registers, and
      return the process to its failure handler. The problem is that we're
      now running with Altivec enabled and the 'junk' checkpointed registers
      are restored. The kernel had only recheckpointed FP.
      
      This patch solves this by only activating FP/Altivec if userspace was
      using them when it entered the kernel and not simply if the process is
      transactional.
      
      Fixes: dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware
      transactional memory in use")
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a7771176
    • Cyril Bur's avatar
      mtd: powernv_flash: Use opal_async_wait_response_interruptible() · 6f469b67
      Cyril Bur authored
      The OPAL calls performed in this driver shouldn't be using
      opal_async_wait_response() as this performs a wait_event() which, on
      long running OPAL calls could result in hung task warnings. wait_event()
      prevents timely signal delivery which is also undesirable.
      
      This patch also attempts to quieten down the use of dev_err() when
      errors haven't actually occurred and also to return better information up
      the stack rather than always -EIO.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6f469b67
    • Cyril Bur's avatar
      powerpc/powernv: Add OPAL_BUSY to opal_error_code() · 77adbd22
      Cyril Bur authored
      Also export opal_error_code() so that it can be used in modules
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      77adbd22
    • Cyril Bur's avatar
      powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async · 9aab2449
      Cyril Bur authored
      This patch adds an _interruptible version of opal_async_wait_response().
      This is useful when a long running OPAL call is performed on behalf of
      a userspace thread, for example, the opal_flash_{read,write,erase}
      functions performed by the powernv-flash MTD driver.
      
      It is foreseeable that these functions would take upwards of two
      minutes causing the wait_event() to block long enough to cause hung
      task warnings. Furthermore, wait_event_interruptible() is preferable
      as otherwise there is no way for signals to stop the process which is
      going to be confusing in userspace.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9aab2449
    • Stewart Smith's avatar
      powernv/opal-sensor: remove not needed lock · 95e1bc1d
      Stewart Smith authored
      Parallel sensor reads could run out of async tokens due to
      opal_get_sensor_data grabbing tokens but then doing the sensor
      read behind a mutex, essentially serializing the (possibly
      asynchronous and relatively slow) sensor read.
      
      It turns out that the mutex isn't needed at all, not only
      should the OPAL interface allow concurrent reads, the implementation
      is certainly safe for that, and if any sensor we were reading
      from somewhere isn't, doing the mutual exclusion in the kernel
      is the wrong place to do it, OPAL should be doing it for the kernel.
      
      So, remove the mutex.
      
      Additionally, we shouldn't be printing out an error when we don't
      get a token as the only way this should happen is if we've been
      interrupted in down_interruptible() on the semaphore.
      Reported-by: default avatarRobert Lippert <rlippert@google.com>
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      95e1bc1d
    • Cyril Bur's avatar
      powerpc/opal: Rework the opal-async interface · 86cd6d98
      Cyril Bur authored
      Future work will add an opal_async_wait_response_interruptible()
      which will call wait_event_interruptible(). This work requires extra
      token state to be tracked as wait_event_interruptible() can return and
      the caller could release the token before OPAL responds.
      
      Currently token state is tracked with two bitfields which are 64 bits
      big but may not need to be as OPAL informs Linux how many async tokens
      there are. It also uses an array indexed by token to store response
      messages for each token.
      
      The bitfields make it difficult to add more state and also provide a
      hard maximum as to how many tokens there can be - it is possible that
      OPAL will inform Linux that there are more than 64 tokens.
      
      Rather than add a bitfield to track the extra state, rework the
      internals slightly.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      [mpe: Fix __opal_async_get_token() when no tokens are free]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      86cd6d98
    • Cyril Bur's avatar
      powerpc/opal: Make __opal_async_{get, release}_token() static · 59cf9a1c
      Cyril Bur authored
      There are no callers of both __opal_async_get_token() and
      __opal_async_release_token().
      
      This patch also removes the possibility of "emergency through
      synchronous call to __opal_async_get_token()" as such it makes more
      sense to initialise opal_sync_sem for the maximum number of async
      tokens.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      59cf9a1c
    • Cyril Bur's avatar
      mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition · efe69414
      Cyril Bur authored
      Because the MTD core might split up a read() or write() from userspace
      into several calls to the driver, we may fail to get a token but already
      have done some work, best to return -EINTR back to userspace and have
      them decide what to do.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      efe69414
    • Cyril Bur's avatar
      mtd: powernv_flash: Remove pointless goto in driver init · e32ec15a
      Cyril Bur authored
      powernv_flash_probe() has pointless goto statements which jump to the
      end of the function to simply return a variable. Rather than checking
      for error and going to the label, just return the error as soon as it is
      detected.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e32ec15a
    • Cyril Bur's avatar
      mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error · 25ee52e6
      Cyril Bur authored
      While this driver expects to interact asynchronously, OPAL is well
      within its rights to return OPAL_SUCCESS to indicate that the operation
      completed without the need for a callback. We shouldn't treat
      OPAL_SUCCESS as an error rather we should wrap up and return promptly to
      the caller.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      25ee52e6
    • Cyril Bur's avatar
      mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON() · 44e2aa2b
      Cyril Bur authored
      BUG_ON() should be reserved in situations where we can not longer
      guarantee the integrity of the system. In the case where
      powernv_flash_async_op() receives an impossible op, we can still
      guarantee the integrity of the system.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      44e2aa2b
    • William A. Kennington III's avatar
      powerpc/opal: Fix EBUSY bug in acquiring tokens · 71e24d77
      William A. Kennington III authored
      The current code checks the completion map to look for the first token
      that is complete. In some cases, a completion can come in but the
      token can still be on lease to the caller processing the completion.
      If this completed but unreleased token is the first token found in the
      bitmap by another tasks trying to acquire a token, then the
      __test_and_set_bit call will fail since the token will still be on
      lease. The acquisition will then fail with an EBUSY.
      
      This patch reorganizes the acquisition code to look at the
      opal_async_token_map for an unleased token. If the token has no lease
      it must have no outstanding completions so we should never see an
      EBUSY, unless we have leased out too many tokens. Since
      opal_async_get_token_inrerruptible is protected by a semaphore, we
      will practically never see EBUSY anymore.
      
      Fixes: 8d724823 ("powerpc/powernv: Infrastructure to support OPAL async completion")
      Signed-off-by: default avatarWilliam A. Kennington III <wak@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      71e24d77
    • Arnd Bergmann's avatar
      powerpc/eeh: Stop using do_gettimeofday() · edfd17ff
      Arnd Bergmann authored
      This interface is inefficient and deprecated because of the y2038
      overflow.
      
      ktime_get_seconds() is an appropriate replacement here, since it
      has sufficient granularity but is more efficient and uses monotonic
      time.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      edfd17ff
    • Vaibhav Jain's avatar
      cxl: Rework the implementation of cxl_stop_trace_psl9() · cbb55eeb
      Vaibhav Jain authored
      Presently the PSL9 specific cxl_stop_trace_psl9() only stops the RX0
      traces on the CXL adapter when a PSL error irq is triggered. The patch
      updates the function to stop all the traces arrays and move them to
      the FIN state. The implementation issues the mmio to TRACECFG register
      to stop the trace array iff it already not in FIN state. This prevents
      the issue of trace data being reset in case of multiple stop mmio
      issued for a single trace array.
      
      Also the patch does some refactoring of existing cxl_stop_trace_psl9()
      and cxl_stop_trace_psl8() functions by moving them to 'pci.c' from
      'debugfs.c' file and marking them as static.
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cbb55eeb
    • Sandipan Das's avatar
      bpf: take advantage of stack_depth tracking in powerpc JIT · ac0761eb
      Sandipan Das authored
      Take advantage of stack_depth tracking, originally introduced for
      x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
      to make sure it stays aligned for functions called from JITed bpf
      program.
      Signed-off-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Reviewed-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ac0761eb
    • Michael Ellerman's avatar
      powerpc/tm: Don't check for WARN in TM Bad Thing handling · 632f0574
      Michael Ellerman authored
      Currently when we take a TM Bad Thing program check exception, we
      search the bug table to see if the program check was generated by a
      WARN/WARN_ON etc.
      
      That makes no sense, the WARN macros use trap instructions, which
      should never generate a TM Bad Thing exception. If they ever did that
      would be a bug and we should oops.
      
      We do have some hand-coded bugs in tm.S, using EMIT_BUG_ENTRY, but
      those are all BUGs not WARNs, and they all use trap instructions
      anyway. Almost certainly this check was incorrectly copied from the
      REASON_TRAP handling in the same function.
      
      Remove it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-By: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      632f0574
    • Michael Ellerman's avatar
      powerpc/mm: Add a CONFIG option to choose if radix is used by default · 1fd6c022
      Michael Ellerman authored
      Currently if the hardware supports the radix MMU we will use
      it, *unless* "disable_radix" is passed on the kernel command line.
      
      However some users would like the reverse semantics. ie. The kernel
      uses the hash MMU by default, unless radix is explicitly requested on
      the command line.
      
      So add a CONFIG option to choose whether we use radix by default or
      not, and expand the disable_radix command line option to allow
      "disable_radix=no" which *enables* radix.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1fd6c022
    • Michael Ellerman's avatar
      powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747
      Michael Ellerman authored
      CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
      on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
      found in "server" processors, from IBM mainly.
      
      Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
      annoying to have two symbols that always have the same value, it's not
      quite annoying enough to bother removing one.
      
      However with the arrival of Power9, we now have the situation where
      CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
      Radix MMU - *not* the "standard" MMU. So it is now actively confusing
      to use it, because it implies that code is disabled or inactive when
      the Radix MMU is in use, however that is not necessarily true.
      
      So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
      formatting updates of some of the affected lines.
      
      This will be a pain for backports, but c'est la vie.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4e003747
    • Michael Ellerman's avatar
      powerpc/64: Free up CPU_FTR_ICSWX · c1807e3f
      Michael Ellerman authored
      The last user of CPU_FTR_ICSWX was removed in commit
      6ff4d3e9 ("powerpc: Remove old unused icswx based coprocessor
      support"), so free the bit up for future use.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c1807e3f
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Add pr_fmt() to hash_utils64.c · 7f142661
      Aneesh Kumar K.V authored
      Make the printks look a bit nicer by adding a prefix.
      
      Radix config now do
       radix-mmu: Page sizes from device-tree:
       radix-mmu: Page size shift = 12 AP=0x0
       radix-mmu: Page size shift = 16 AP=0x5
       radix-mmu: Page size shift = 21 AP=0x1
       radix-mmu: Page size shift = 30 AP=0x2
      
      This patch update hash config to do similar dmesg output. With the patch we have
      
       hash-mmu: Page sizes from device-tree:
       hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
       hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
       hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
       hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
       hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
       hash-mmu: base_shift=20: shift=20, sllp=0x0111, avpnm=0x00000000, tlbiel=0, penc=2
       hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
       hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7f142661