1. 05 Aug, 2015 5 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Use vfs_getname syscall arg beautifier in more syscalls · 34221118
      Arnaldo Carvalho de Melo authored
      Those were covered and tested in this cset:
      
       access, chdir, chmod, chown, chroot, creat, getxattr,
       inotify_add_watch, lchown, lgetxattr, listxattr,
       lsetxattr, mkdir, mkdirat, mknod, rmdir, faccessat,
       newfstatat, openat, readlink, readlinkat, removexattr,
       setxattr, statfs, swapon, swapoff, truncate, unlinkat,
       utime, utimes, utimensat.
      
      E.g.:
      
        # trace -e statfs,access,mkdir mkdir /tmp/bla
         0.285 (0.020 ms): mkdir/2799 access(filename: /etc/ld.so.preload, mode: R         ) = -1 ENOENT No such file or directory
         1.070 (0.032 ms): mkdir/2799 statfs(pathname: /sys/fs/selinux, buf: 0x7ffeafbdc930) = 0
         1.087 (0.013 ms): mkdir/2799 statfs(pathname: /sys/fs/selinux, buf: 0x7ffeafbdc820) = 0
         1.189 (0.014 ms): mkdir/2799 access(filename: /etc/selinux/config                 ) = 0
         1.905 (0.610 ms): mkdir/2799 mkdir(pathname: /tmp/bla, mode: 511                  ) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Milian Wolff <mail@milianw.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-wbqtnlktquun3wtpjdz3okul@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      
        and an empty message aborts the commit.
      34221118
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Deref sys_enter pointer args with contents from probe:vfs_getname · f994592d
      Arnaldo Carvalho de Melo authored
      To work like strace and dereference syscall pointer args we need to
      insert probes (or tracepoints) right after we copy those bytes from
      userspace.
      
      Since we're formatting the syscall args at raw_syscalls:sys_enter time,
      we need to have a formatter that just stores the position where, later,
      when we get the probe:vfs_getname, we can insert the pointer contents.
      
      Now, if a probe:vfs_getname with this format is in place:
      
       # perf probe -l
        probe:vfs_getname (on getname_flags:72@/home/git/linux/fs/namei.c with pathname)
      
      That was, in this case, put in place with:
      
       # perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
       Added new event:
        probe:vfs_getname    (on getname_flags:72 with pathname=filename:string)
      
       You can now use it in all perf tools, such as:
      
      	perf record -e probe:vfs_getname -aR sleep 1
       #
      
      Then 'perf trace' will notice that and do the pointer -> contents
      expansion:
      
       # trace -e open touch /tmp/bla
        0.165 (0.010 ms): touch/17752 open(filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
        0.195 (0.011 ms): touch/17752 open(filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
        0.512 (0.012 ms): touch/17752 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
        0.582 (0.012 ms): touch/17752 open(filename: /tmp/bla, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: 438) = 3
       #
      
      Roughly equivalent to strace's output:
      
       # strace -rT -e open touch /tmp/bla
        0.000000 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000039>
        0.000317 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000102>
        0.001461 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000072>
        0.000405 open("/tmp/bla", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 <0.000055>
        0.000641 +++ exited with 0 +++
       #
      
      Now we need to either look for at all syscalls that are marked as
      pointers and have some well known names ("filename", "pathname", etc)
      and set the arg formatter to the one used for the "open" syscall in this
      patch.
      
      This implementation works for syscalls with just a string being copied
      from userspace, for matching syscalls with more than one string being
      copied via the same probe/trace point (vfs_getname) we need to extend
      the vfs_getname probe spec to include the pointer too, but there are
      some problems with that in 'perf probe' or the kernel kprobes code, need
      to investigate before considering supporting multiple strings per
      syscall.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Milian Wolff <mail@milianw.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-xvuwx6nuj8cf389kf9s2ue2s@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f994592d
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Use a constant for the syscall formatting buffer · e4d44e83
      Arnaldo Carvalho de Melo authored
      We were using it as a magic number, 1024, fix that.
      
      Eventually we need to stop doing it per line, and do it per
      arg, traversing the args at output time, to avoid the memmove()
      calls that will be used in the next cset to replace pointers
      present at raw_syscalls:sys_enter time with its contents that
      appear at probe:vfs_getname time, before raw_syscalls:sys_exit
      time.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Milian Wolff <mail@milianw.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-4sz3wid39egay1pp8qmbur4u@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e4d44e83
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Remember if the vfs_getname tracepoint/kprobe is in place · 08c98776
      Arnaldo Carvalho de Melo authored
      So that we can later decide if we will store where to expand the
      pathname once we are handling vfs_getname or if we should instead
      just go on and straight away print the pointer.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Milian Wolff <mail@milianw.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-ytxk5s5jpc50wahffmlxgxuw@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      08c98776
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Do not show syscall tracepoint filter in the --no-syscalls case · 2e5e5f87
      Arnaldo Carvalho de Melo authored
      We were accessing trace->syscalls.events members even when that struct
      wasn't initialized, i.e. --no-syscalls was specified on the command
      line, fix it to show that, still in debug mode, when we have an event
      qualifier list, i.e. when we actually are doing subset syscall tracing.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Milian Wolff <mail@milianw.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Fixes: 19867b61 ("perf trace: Use event filters for the event qualifier list")
      Link: http://lkml.kernel.org/n/tip-7980ym6vujgh3yiai0cqzc88@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2e5e5f87
  2. 04 Aug, 2015 31 commits
  3. 31 Jul, 2015 4 commits
    • Oleg Nesterov's avatar
      uprobes: Fix the waitqueue_active() check in xol_free_insn_slot() · 2a742ced
      Oleg Nesterov authored
      The xol_free_insn_slot()->waitqueue_active() check is buggy. We
      need mb() after we set the conditon for wait_event(), or
      xol_take_insn_slot() can miss the wakeup.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150721134036.GA4799@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2a742ced
    • Oleg Nesterov's avatar
      uprobes: Use vm_special_mapping to name the XOL vma · 704bde3c
      Oleg Nesterov authored
      Change xol_add_vma() to use _install_special_mapping(), this way
      we can name the vma installed by uprobes. Currently it looks
      like private anonymous mapping, this is confusing and
      complicates the debugging. With this change /proc/$pid/maps
      reports "[uprobes]".
      
      As a side effect this will cause core dumps to include the XOL vma
      and I think this is good; this can help to debug the problem if
      the app crashed because it was probed.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150721134033.GA4796@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      704bde3c
    • Oleg Nesterov's avatar
      uprobes: Fix the usage of install_special_mapping() · f58bea2f
      Oleg Nesterov authored
      install_special_mapping(pages) expects that "pages" is the zero-
      terminated array while xol_add_vma() passes &area->page, this
      means that special_mapping_fault() can wrongly use the next
      member in xol_area (vaddr) as "struct page *".
      
      Fortunately, this area is not expandable so pgoff != 0 isn't
      possible (modulo bugs in special_mapping_vmops), but still this
      does not look good.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150721134031.GA4789@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f58bea2f
    • Oleg Nesterov's avatar
      uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever · db087ef6
      Oleg Nesterov authored
      The previous change documents that cleanup_return_instances()
      can't always detect the dead frames, the stack can grow. But
      there is one special case which imho worth fixing:
      arch_uretprobe_is_alive() can return true when the stack didn't
      actually grow, but the next "call" insn uses the already
      invalidated frame.
      
      Test-case:
      
      	#include <stdio.h>
      	#include <setjmp.h>
      
      	jmp_buf jmp;
      	int nr = 1024;
      
      	void func_2(void)
      	{
      		if (--nr == 0)
      			return;
      		longjmp(jmp, 1);
      	}
      
      	void func_1(void)
      	{
      		setjmp(jmp);
      		func_2();
      	}
      
      	int main(void)
      	{
      		func_1();
      		return 0;
      	}
      
      If you ret-probe func_1() and func_2() prepare_uretprobe() hits
      the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not
      reported.
      
      When we know that the new call is not chained, we can do the
      more strict check. In this case "sp" points to the new ret-addr,
      so every frame which uses the same "sp" must be dead. The only
      complication is that arch_uretprobe_is_alive() needs to know was
      it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum
      and change prepare_uretprobe() to pass RP_CHECK_CALL only if
      !chained.
      
      Note: arch_uretprobe_is_alive() could also re-read *sp and check
      if this word is still trampoline_vaddr. This could obviously
      improve the logic, but I would like to avoid another
      copy_from_user() especially in the case when we can't avoid the
      false "alive == T" positives.
      Tested-by: default avatarPratyush Anand <panand@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: default avatarAnton Arapov <arapov@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150721134028.GA4786@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      db087ef6