1. 22 Apr, 2014 1 commit
  2. 18 Apr, 2014 6 commits
  3. 17 Apr, 2014 17 commits
    • Oleg Nesterov's avatar
      uprobes/x86: Emulate relative conditional "near" jmp's · 6cc5e7ff
      Oleg Nesterov authored
      Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
      if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
      condition.
      
      Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
      correct no matter if this jmp is "near" or "short".
      Reported-by: default avatarJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      6cc5e7ff
    • Oleg Nesterov's avatar
      uprobes/x86: Emulate relative conditional "short" jmp's · 8f95505b
      Oleg Nesterov authored
      Teach branch_emulate_op() to emulate the conditional "short" jmp's which
      check regs->flags.
      
      Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
      They all are rel8 and thus they can't trigger the problem, but perhaps
      we will add the support in future just for completeness.
      Reported-by: default avatarJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      8f95505b
    • Oleg Nesterov's avatar
      uprobes/x86: Emulate relative call's · 8e89c0be
      Oleg Nesterov authored
      See the previous "Emulate unconditional relative jmp's" which explains
      why we can not execute "jmp" out-of-line, the same applies to "call".
      
      Emulating of rip-relative call is trivial, we only need to additionally
      push the ret-address. If this fails, we execute this instruction out of
      line and this should trigger the trap, the probed application should die
      or the same insn will be restarted if a signal handler expands the stack.
      We do not even need ->post_xol() for this case.
      
      But there is a corner (and almost theoretical) case: another thread can
      expand the stack right before we execute this insn out of line. In this
      case it hit the same problem we are trying to solve. So we simply turn
      the probed insn into "call 1f; 1:" and add ->post_xol() which restores
      ->sp and restarts.
      
      Many thanks to Jonathan who finally found the standalone reproducer,
      otherwise I would never resolve the "random SIGSEGV's under systemtap"
      bug-report. Now that the problem is clear we can write the simplified
      test-case:
      
      	void probe_func(void), callee(void);
      
      	int failed = 1;
      
      	asm (
      		".text\n"
      		".align 4096\n"
      		".globl probe_func\n"
      		"probe_func:\n"
      		"call callee\n"
      		"ret"
      	);
      
      	/*
      	 * This assumes that:
      	 *
      	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
      	 *
      	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
      	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
      	 *
      	 * so we can target the non-canonical address from xol_vma using
      	 * the simple math below, 100 * 4096 is just the random offset
      	 */
      	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");
      
      	void callee(void)
      	{
      		failed = 0;
      	}
      
      	int main(void)
      	{
      		probe_func();
      		return failed;
      	}
      
      It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
      randomize_va_space/etc can change the placement of xol area).
      
      Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
      differently. This patch relies on lib/insn.c and thus implements the intel's
      behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
      this insn, so we postpone the fix until we decide what should we do; emulate
      or not, support or not, etc.
      Reported-by: default avatarJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      8e89c0be
    • Oleg Nesterov's avatar
      uprobes/x86: Emulate nop's using ops->emulate() · d2410063
      Oleg Nesterov authored
      Finally we can kill the ugly (and very limited) code in __skip_sstep().
      Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.
      
      Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
      "(rep;)+ nop;" at least, and (afaics) much more.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      d2410063
    • Oleg Nesterov's avatar
      uprobes/x86: Emulate unconditional relative jmp's · 7ba6db2d
      Oleg Nesterov authored
      Currently we always execute all insns out-of-line, including relative
      jmp's and call's. This assumes that even if regs->ip points to nowhere
      after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
      update it correctly.
      
      However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
      is not canonical. In this case CPU generates #GP and general_protection()
      kills the task which tries to execute this insn out-of-line.
      
      Now that we have uprobe_xol_ops we can teach uprobes to emulate these
      insns and solve the problem. This patch adds branch_xol_ops which has
      a single branch_emulate_op() hook, so far it can only handle rel8/32
      relative jmp's.
      
      TODO: move ->fixup into the union along with rip_rela_target_address.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarJonathan Lebon <jlebon@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      7ba6db2d
    • Oleg Nesterov's avatar
      uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and... · 8faaed1b
      Oleg Nesterov authored
      uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()
      
      1. Add the trivial sizeof_long() helper and change other callers of
         is_ia32_task() to use it.
      
         TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
         not necessarily mean 32bit. Fortunately syscall-like insns can't be
         probed so it actually works, but it would be better to rename and
         use is_ia32_frame().
      
      2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
         and adjust_ret_addr() should be named "nleft". And in fact only the
         last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
         needs to inspect the non-zero error code.
      
      TODO: adjust_ret_addr() should die. We can always calculate the value
      we need to write into *regs->sp, just UPROBE_FIX_CALL should record
      insn->length.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      8faaed1b
    • Oleg Nesterov's avatar
      uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible · 75f9ef0b
      Oleg Nesterov authored
      SIGILL after the failed arch_uprobe_post_xol() should only be used as
      a last resort, we should try to restart the probed insn if possible.
      
      Currently only adjust_ret_addr() can fail, and this can only happen if
      another thread unmapped our stack after we executed "call" out-of-line.
      Most probably the application if buggy, but even in this case it can
      have a handler for SIGSEGV/etc. And in theory it can be even correct
      and do something non-trivial with its memory.
      
      Of course we can't restart unconditionally, so arch_uprobe_post_xol()
      does this only if ->post_xol() returns -ERESTART even if currently this
      is the only possible error.
      
      default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
      pointed out it should not forget to pop off the return address pushed
      by this insn executed out-of-line.
      
      Note: this is not "perfect", we do not want the extra handler_chain()
      after restart, but I think this is the best solution we can realistically
      do without too much uglifications.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      75f9ef0b
    • Oleg Nesterov's avatar
      uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails · 014940ba
      Oleg Nesterov authored
      Currently the error from arch_uprobe_post_xol() is silently ignored.
      This doesn't look good and this can lead to the hard-to-debug problems.
      
      1. Change handle_singlestep() to loudly complain and send SIGILL.
      
         Note: this only affects x86, ppc/arm can't fail.
      
      2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
         avoid TF games if it is going to return an error.
      
         This can help to to analyze the problem, if nothing else we should
         not report ->ip = xol_slot in the core-file.
      
         Note: this means that handle_riprel_post_xol() can be called twice,
         but this is fine because it is idempotent.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      014940ba
    • Oleg Nesterov's avatar
      uprobes/x86: Conditionalize the usage of handle_riprel_insn() · e55848a4
      Oleg Nesterov authored
      arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
      but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
      Move the callsite into "default" case and change the "0xff" case to
      fall-through.
      
      We are going to add the various hooks to handle the rip-relative
      jmp/call instructions (and more), we need this change to enforce the
      fact that the new code can not conflict with is_riprel_insn() logic
      which, after this change, can only be used by default_xol_ops.
      
      Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
      directly. This is fine unless another _xol_ops we may add later will
      need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
      In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
      we will need anyway in the long term.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      e55848a4
    • Oleg Nesterov's avatar
      uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops · 8ad8e9d3
      Oleg Nesterov authored
      Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
      move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
      set of methods and change arch_uprobe_pre/post_xol() accordingly.
      
      This way we can add the new uprobe_xol_ops's to handle the insns
      which need the special processing (rip-relative jmp/call at least).
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      8ad8e9d3
    • Oleg Nesterov's avatar
      uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks · 34e7317d
      Oleg Nesterov authored
      No functional changes. Preparation to simplify the review of the next
      change. Just reorder the code in arch_uprobe_pre/post_xol() functions
      so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.
      
      Also change arch_uprobe_pre_xol() to use utask instead of autask, to
      make the code more symmetrical with arch_uprobe_post_xol().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      34e7317d
    • Oleg Nesterov's avatar
      uprobes/x86: Gather "riprel" functions together · d20737c0
      Oleg Nesterov authored
      Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to
      the closely related handle_riprel_insn(). This way it is simpler to
      read and understand this code, and this lessens the number of ifdef's.
      
      While at it, update the comment in handle_riprel_post_xol() as Jim
      suggested.
      
      TODO: rename them somehow to make the naming consistent.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      d20737c0
    • Oleg Nesterov's avatar
      uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg · 59078d4b
      Oleg Nesterov authored
      Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
      it is true insn_rip_relative() must return false. validate_insn_bits()
      passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
      checks insn->x86_64.
      
      Also, remove the no longer needed "struct mm_struct *mm" argument and
      the unnecessary "return" at the end.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      59078d4b
    • Oleg Nesterov's avatar
      uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn() · ddb69f27
      Oleg Nesterov authored
      No functional changes, preparation.
      
      Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
      with the following modifications:
      
      	- Do not call insn_get_opcode() again, it was already called
      	  by validate_insn_bits().
      
      	- Move "case 0xea" up. This way "case 0xff" can fall through
      	  to default case.
      
      	- change "case 0xff" to use the nested "switch (MODRM_REG)",
      	  this way the code looks a bit simpler.
      
      	- Make the comments look consistent.
      
      While at it, kill the initialization of rip_rela_target_address and
      ->fixups, we can rely on kzalloc(). We will add the new members into
      arch_uprobe, it would be better to assume that everything is zero by
      default.
      
      TODO: cleanup/fix the mess in validate_insn_bits() paths:
      
      	- validate_insn_64bits() and validate_insn_32bits() should be
      	  unified.
      
      	- "ifdef" is not used consistently; if good_insns_64 depends
      	  on CONFIG_X86_64, then probably good_insns_32 should depend
      	  on CONFIG_X86_32/EMULATION
      
      	- the usage of mm->context.ia32_compat looks wrong if the task
      	  is TIF_X32.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      ddb69f27
    • Oleg Nesterov's avatar
      uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() · 8a6b1732
      Oleg Nesterov authored
      UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
      patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
      not only it doesn't help, it harms.
      
      It can only help to avoid arch_uprobe_skip_sstep() if it was already
      called before and failed. But this is ugly, if we want to know whether
      we can emulate this instruction or not we should do this analysis in
      arch_uprobe_analyze_insn(), not when we hit this probe for the first
      time.
      
      And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
      fail or not depending on the task/register state, if this insn can be
      emulated but, say, put_user() fails we need to xol it this time, but
      this doesn't mean we shouldn't try to emulate it when this or another
      thread hits this bp next time.
      
      And this is the actual reason for this change. We need to emulate the
      "call" insn, but push(return-address) can obviously fail.
      
      Per-arch notes:
      
      	x86: __skip_sstep() can only emulate "rep;nop". With this
      	     change it will be called every time and most probably
      	     for no reason.
      
      	     This will be fixed by the next changes. We need to
      	     change this suboptimal code anyway.
      
      	arm: Should not be affected. It has its own "bool simulate"
      	     flag checked in arch_uprobe_skip_sstep().
      
      	ppc: Looks like, it can emulate almost everything. Does it
      	     actually need to record the fact that emulate_step()
      	     failed? Hopefully not. But if yes, it can add the ppc-
      	     specific flag into arch_uprobe.
      
      TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: default avatarDavid A. Long <dave.long@linaro.org>
      Reviewed-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      8a6b1732
    • Masami Hiramatsu's avatar
      kprobes/x86: Fix page-fault handling logic · 6381c24c
      Masami Hiramatsu authored
      Current kprobes in-kernel page fault handler doesn't
      expect that its single-stepping can be interrupted by
      an NMI handler which may cause a page fault(e.g. perf
      with callback tracing).
      
      In that case, the page-fault handled by kprobes and it
      misunderstands the page-fault has been caused by the
      single-stepping code and tries to recover IP address
      to probed address.
      
      But the truth is the page-fault has been caused by the
      NMI handler, and do_page_fault failes to handle real
      page fault because the IP address is modified and
      causes Kernel BUGs like below.
      
       ----
       [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
       [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
      
      To handle this correctly, I fixed the kprobes fault
      handler to ensure the faulted ip address is its own
      single-step buffer instead of checking current kprobe
      state.
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fche@redhat.com
      Cc: systemtap@sourceware.org
      Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jpSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6381c24c
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · b3d5fc3c
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/core
      
      Pull perf/core improvements and fixes from Jiri Olsa:
      
      User visible changes:
      
        * Add --percentage option to control absolute/relative percentage output (Namhyung Kim)
      
      Plumbing changes:
      
        * Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by completion scripts (Ramkumar Ramachandra)
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b3d5fc3c
  4. 16 Apr, 2014 16 commits