1. 01 Sep, 2017 22 commits
    • Christophe Leroy's avatar
      powerpc/32: remove a NOP from memset() · ad1b0122
      Christophe Leroy authored
      memset() is patched after initialisation to activate the
      optimised part which uses cache instructions.
      
      Today we have a 'b 2f' to skip the optimised patch, which then gets
      replaced by a NOP, implying a useless cycle consumption.
      As we have a 'bne 2f' just before, we could use that instruction
      for the live patching, hence removing the need to have a
      dedicated 'b 2f' to be replaced by a NOP.
      
      This patch changes the 'bne 2f' by a 'b 2f'. During init, that
      'b 2f' is then replaced by 'bne 2f'
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ad1b0122
    • Christophe Leroy's avatar
      powerpc/32: optimise memset() · 7bf6057b
      Christophe Leroy authored
      There is no need to extend the set value to an int when the length
      is lower than 4 as in that case we only do byte stores.
      We can therefore immediately branch to the part handling it.
      By separating it from the normal case, we are able to eliminate
      a few actions on the destination pointer.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7bf6057b
    • Christophe Leroy's avatar
      powerpc: fix location of two EXPORT_SYMBOL · c0622167
      Christophe Leroy authored
      Commit 9445aa1a ("ppc: move exports to definitions")
      added EXPORT_SYMBOL() for memset() and flush_hash_pages() in
      the middle of the functions.
      
      This patch moves them at the end of the two functions.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c0622167
    • Christophe Leroy's avatar
      powerpc/32: add memset16() · da74f659
      Christophe Leroy authored
      Commit 694fc88c ("powerpc/string: Implement optimized
      memset variants") added memset16(), memset32() and memset64()
      for the 64 bits PPC.
      
      On 32 bits, memset64() is not relevant, and as shown below,
      the generic version of memset32() gives a good code, so only
      memset16() is candidate for an optimised version.
      
      000009c0 <memset32>:
       9c0:   2c 05 00 00     cmpwi   r5,0
       9c4:   39 23 ff fc     addi    r9,r3,-4
       9c8:   4d 82 00 20     beqlr
       9cc:   7c a9 03 a6     mtctr   r5
       9d0:   94 89 00 04     stwu    r4,4(r9)
       9d4:   42 00 ff fc     bdnz    9d0 <memset32+0x10>
       9d8:   4e 80 00 20     blr
      
      The last part of memset() handling the not 4-bytes multiples
      operates on bytes, making it unsuitable for handling word without
      modification. As it would increase memset() complexity, it is
      better to implement memset16() from scratch. In addition it
      has the advantage of allowing a more optimised memset16() than what
      we would have by using the memset() function.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      da74f659
    • Paul Mackerras's avatar
      powerpc: Wrap register number correctly for string load/store instructions · 45f62159
      Paul Mackerras authored
      Michael Ellerman reported that emulate_loadstore() was trying to
      access element 32 of regs->gpr[], which doesn't exist, when
      emulating a string store instruction.  This is because the string
      load and store instructions (lswi, lswx, stswi and stswx) are
      defined to wrap around from register 31 to register 0 if the number
      of bytes being loaded or stored is sufficiently large.  This wrapping
      was not implemented in the emulation code.  To fix it, we mask the
      register number after incrementing it.
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Fixes: c9f6f4ed ("powerpc: Implement emulation of string loads and stores")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      45f62159
    • Paul Mackerras's avatar
      powerpc: Emulate load/store floating point as integer word instructions · d2b65ac6
      Paul Mackerras authored
      This adds emulation for the lfiwax, lfiwzx and stfiwx instructions.
      This necessitated adding a new flag to indicate whether a floating
      point or an integer conversion was needed for LOAD_FP and STORE_FP,
      so this moves the size field in op->type up 4 bits.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d2b65ac6
    • Paul Mackerras's avatar
      powerpc: Use instruction emulation infrastructure to handle alignment faults · 31bfdb03
      Paul Mackerras authored
      This replaces almost all of the instruction emulation code in
      fix_alignment() with calls to analyse_instr(), emulate_loadstore()
      and emulate_dcbz().  The only emulation code left is the SPE
      emulation code; analyse_instr() etc. do not handle SPE instructions
      at present.
      
      One result of this is that we can now handle alignment faults on
      all the new VSX load and store instructions that were added in POWER9.
      VSX loads/stores will take alignment faults for unaligned accesses
      to cache-inhibited memory.
      
      Another effect is that we no longer rely on the DAR and DSISR values
      set by the processor.
      
      With this, we now need to include the instruction emulation code
      unconditionally.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      31bfdb03
    • Paul Mackerras's avatar
      powerpc: Separate out load/store emulation into its own function · a53d5182
      Paul Mackerras authored
      This moves the parts of emulate_step() that deal with emulating
      load and store instructions into a new function called
      emulate_loadstore().  This is to make it possible to reuse this
      code in the alignment handler.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a53d5182
    • Paul Mackerras's avatar
      powerpc: Handle opposite-endian processes in emulation code · d955189a
      Paul Mackerras authored
      This adds code to the load and store emulation code to byte-swap
      the data appropriately when the process being emulated is set to
      the opposite endianness to that of the kernel.
      
      This also enables the emulation for the multiple-register loads
      and stores (lmw, stmw, lswi, stswi, lswx, stswx) to work for
      little-endian.  In little-endian mode, the partial word at the
      end of a transfer for lsw*/stsw* (when the byte count is not a
      multiple of 4) is loaded/stored at the least-significant end of
      the register.  Additionally, this fixes a bug in the previous
      code in that it could call read_mem/write_mem with a byte count
      that was not 1, 2, 4 or 8.
      
      Note that this only works correctly on processors with "true"
      little-endian mode, such as IBM POWER processors from POWER6 on, not
      the so-called "PowerPC" little-endian mode that uses address swizzling
      as implemented on the old 32-bit 603, 604, 740/750, 74xx CPUs.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d955189a
    • Paul Mackerras's avatar
      powerpc: Set regs->dar if memory access fails in emulate_step() · b9da9c8a
      Paul Mackerras authored
      This adds code to the instruction emulation code to set regs->dar
      to the address of any memory access that fails.  This address is
      not necessarily the same as the effective address of the instruction,
      because if the memory access is unaligned, it might cross a page
      boundary and fault on the second page.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b9da9c8a
    • Paul Mackerras's avatar
      powerpc: Emulate the dcbz instruction · b2543f7b
      Paul Mackerras authored
      This adds code to analyse_instr() and emulate_step() to understand the
      dcbz (data cache block zero) instruction.  The emulate_dcbz() function
      is made public so it can be used by the alignment handler in future.
      (The apparently unnecessary cropping of the address to 32 bits is
      there because it will be needed in that situation.)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b2543f7b
    • Paul Mackerras's avatar
      powerpc: Emulate load/store floating double pair instructions · 1f41fb79
      Paul Mackerras authored
      This adds lfdp[x] and stfdp[x] to the set of instructions that
      analyse_instr() and emulate_step() understand.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1f41fb79
    • Paul Mackerras's avatar
      powerpc: Emulate vector element load/store instructions · e61ccc7b
      Paul Mackerras authored
      This adds code to analyse_instr() and emulate_step() to handle the
      vector element loads and stores:
      
      lvebx, lvehx, lvewx, stvebx, stvehx, stvewx.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e61ccc7b
    • Paul Mackerras's avatar
      powerpc: Emulate FP/vector/VSX loads/stores correctly when regs not live · c22435a5
      Paul Mackerras authored
      At present, the analyse_instr/emulate_step code checks for the
      relevant MSR_FP/VEC/VSX bit being set when a FP/VMX/VSX load
      or store is decoded, but doesn't recheck the bit before reading or
      writing the relevant FP/VMX/VSX register in emulate_step().
      
      Since we don't have preemption disabled, it is possible that we get
      preempted between checking the MSR bit and doing the register access.
      If that happened, then the registers would have been saved to the
      thread_struct for the current process.  Accesses to the CPU registers
      would then potentially read stale values, or write values that would
      never be seen by the user process.
      
      Another way that the registers can become non-live is if a page
      fault occurs when accessing user memory, and the page fault code
      calls a copy routine that wants to use the VMX or VSX registers.
      
      To fix this, the code for all the FP/VMX/VSX loads gets restructured
      so that it forms an image in a local variable of the desired register
      contents, then disables preemption, checks the MSR bit and either
      sets the CPU register or writes the value to the thread struct.
      Similarly, the code for stores checks the MSR bit, copies either the
      CPU register or the thread struct to a local variable, then reenables
      preemption and then copies the register image to memory.
      
      If the instruction being emulated is in the kernel, then we must not
      use the register values in the thread_struct.  In this case, if the
      relevant MSR enable bit is not set, then emulate_step refuses to
      emulate the instruction.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c22435a5
    • Paul Mackerras's avatar
      powerpc: Make load/store emulation use larger memory accesses · e0a0986b
      Paul Mackerras authored
      At the moment, emulation of loads and stores of up to 8 bytes to
      unaligned addresses on a little-endian system uses a sequence of
      single-byte loads or stores to memory.  This is rather inefficient,
      and the code is hard to follow because it has many ifdefs.
      In addition, the Power ISA has requirements on how unaligned accesses
      are performed, which are not met by doing all accesses as
      sequences of single-byte accesses.
      
      Emulation of VSX loads and stores uses __copy_{to,from}_user,
      which means the emulation code has no control on the size of
      accesses.
      
      To simplify this, we add new copy_mem_in() and copy_mem_out()
      functions for accessing memory.  These use a sequence of the largest
      possible aligned accesses, up to 8 bytes (or 4 on 32-bit systems),
      to copy memory between a local buffer and user memory.  We then
      rewrite {read,write}_mem_unaligned and the VSX load/store
      emulation using these new functions.
      
      These new functions also simplify the code in do_fp_load() and
      do_fp_store() for the unaligned cases.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e0a0986b
    • Paul Mackerras's avatar
      powerpc: Add emulation for the addpcis instruction · 958465ee
      Paul Mackerras authored
      The addpcis instruction puts the sum of the next instruction address
      plus a constant into a register.  Since the result depends on the
      address of the instruction, it will give an incorrect result if it
      is single-stepped out of line, which is what the *probes subsystem
      will currently do if a probe is placed on an addpcis instruction.
      This fixes the problem by adding emulation of it to analyse_instr().
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      958465ee
    • Paul Mackerras's avatar
      powerpc: Don't update CR0 in emulation of popcnt, prty, bpermd instructions · 5762e083
      Paul Mackerras authored
      The architecture shows the least-significant bit of the instruction
      word as reserved for the popcnt[bwd], prty[wd] and bpermd
      instructions, that is, these instructions never update CR0.
      Therefore this changes the emulation of these instructions to
      skip the CR0 update.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5762e083
    • Paul Mackerras's avatar
      powerpc: Fix emulation of the isel instruction · f1bbb99f
      Paul Mackerras authored
      The case added for the isel instruction was added inside a switch
      statement which uses the 10-bit minor opcode field in the 0x7fe
      bits of the instruction word.  However, for the isel instruction,
      the minor opcode field is only the 0x3e bits, and the 0x7c0 bits
      are used for the "BC" field, which indicates which CR bit to use
      to select the result.
      
      Therefore, for the isel emulation to work correctly when BC != 0,
      we need to match on ((instr >> 1) & 0x1f) == 15).  To do this, we
      pull the isel case out of the switch statement and put it in an
      if statement of its own.
      
      Fixes: e27f71e5 ("powerpc/lib/sstep: Add isel instruction emulation")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f1bbb99f
    • Paul Mackerras's avatar
      powerpc/64: Fix update forms of loads and stores to write 64-bit EA · d120cdbc
      Paul Mackerras authored
      When a 64-bit processor is executing in 32-bit mode, the update forms
      of load and store instructions are required by the architecture to
      write the full 64-bit effective address into the RA register, though
      only the bottom 32 bits are used to address memory.  Currently,
      the instruction emulation code writes the truncated address to the
      RA register.  This fixes it by keeping the full 64-bit EA in the
      instruction_op structure, truncating the address in emulate_step()
      where it is used to address memory, rather than in the address
      computations in analyse_instr().
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d120cdbc
    • Paul Mackerras's avatar
      powerpc: Handle most loads and stores in instruction emulation code · 350779a2
      Paul Mackerras authored
      This extends the instruction emulation infrastructure in sstep.c to
      handle all the load and store instructions defined in the Power ISA
      v3.0, except for the atomic memory operations, ldmx (which was never
      implemented), lfdp/stfdp, and the vector element load/stores.
      
      The instructions added are:
      
      Integer loads and stores: lbarx, lharx, lqarx, stbcx., sthcx., stqcx.,
      lq, stq.
      
      VSX loads and stores: lxsiwzx, lxsiwax, stxsiwx, lxvx, lxvl, lxvll,
      lxvdsx, lxvwsx, stxvx, stxvl, stxvll, lxsspx, lxsdx, stxsspx, stxsdx,
      lxvw4x, lxsibzx, lxvh8x, lxsihzx, lxvb16x, stxvw4x, stxsibx, stxvh8x,
      stxsihx, stxvb16x, lxsd, lxssp, lxv, stxsd, stxssp, stxv.
      
      These instructions are handled both in the analyse_instr phase and in
      the emulate_step phase.
      
      The code for lxvd2ux and stxvd2ux has been taken out, as those
      instructions were never implemented in any processor and have been
      taken out of the architecture, and their opcodes have been reused for
      other instructions in POWER9 (lxvb16x and stxvb16x).
      
      The emulation for the VSX loads and stores uses helper functions
      which don't access registers or memory directly, which can hopefully
      be reused by KVM later.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      350779a2
    • Paul Mackerras's avatar
      powerpc: Don't check MSR FP/VMX/VSX enable bits in analyse_instr() · ee0a54d7
      Paul Mackerras authored
      This removes the checks for the FP/VMX/VSX enable bits in the MSR
      from analyse_instr() and adds them to emulate_step() instead.
      
      The reason for this is that we may want to use analyse_instr() in
      a situation where the FP/VMX/VSX register values are stored in the
      current thread_struct and the FP/VMX/VSX enable bits in the MSR
      image in the pt_regs are zero.  Since analyse_instr() doesn't make
      any changes to register state, it is reasonable for it to indicate
      what the effect of an instruction would be even though the relevant
      enable bit is off.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ee0a54d7
    • Paul Mackerras's avatar
      powerpc: Change analyse_instr so it doesn't modify *regs · 3cdfcbfd
      Paul Mackerras authored
      The analyse_instr function currently doesn't just work out what an
      instruction does, it also executes those instructions whose effect
      is only to update CPU registers that are stored in struct pt_regs.
      This is undesirable because optprobes uses analyse_instr to work out
      if an instruction could be successfully emulated in future.
      
      This changes analyse_instr so it doesn't modify *regs; instead it
      stores information in the instruction_op structure to indicate what
      registers (GPRs, CR, XER, LR) would be set and what value they would
      be set to.  A companion function called emulate_update_regs() can
      then use that information to update a pt_regs struct appropriately.
      
      As a minor cleanup, this replaces inline asm using the cntlzw and
      cntlzd instructions with calls to __builtin_clz() and __builtin_clzl().
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3cdfcbfd
  2. 31 Aug, 2017 18 commits