trace_example.txt 13.3 KB
Newer Older
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
1 2 3 4
Demonstrations of trace.


trace probes functions you specify and displays trace messages if a particular
5 6
condition is met. You can control the message format to display function
arguments and return values.
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
7 8 9 10

For example, suppose you want to trace all commands being exec'd across the
system:

11
# trace 'sys_execve "%s", arg1'
12 13 14 15 16 17 18 19 20 21 22 23 24 25
PID    COMM         FUNC             -
4402   bash         sys_execve       /usr/bin/man
4411   man          sys_execve       /usr/local/bin/less
4411   man          sys_execve       /usr/bin/less
4410   man          sys_execve       /usr/local/bin/nroff
4410   man          sys_execve       /usr/bin/nroff
4409   man          sys_execve       /usr/local/bin/tbl
4409   man          sys_execve       /usr/bin/tbl
4408   man          sys_execve       /usr/local/bin/preconv
4408   man          sys_execve       /usr/bin/preconv
4415   nroff        sys_execve       /usr/bin/locale
4416   nroff        sys_execve       /usr/bin/groff
4418   groff        sys_execve       /usr/bin/grotty
4417   groff        sys_execve       /usr/bin/troff
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
26 27 28 29 30 31 32 33 34 35 36 37 38 39
^C

The ::sys_execve syntax specifies that you want an entry probe (which is the
default), in a kernel function (which is the default) called sys_execve. Next,
the format string to print is simply "%s", which prints a string. Finally, the
value to print is the first argument to the sys_execve function, which happens
to be the command that is exec'd. The above trace was generated by executing
"man ls" in a separate shell. As you see, man executes a number of additional
programs to finally display the man page.

Next, suppose you are looking for large reads across the system. Let's trace
the read system call and inspect the third argument, which is the number of
bytes to be read:

40
# trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
41 42 43 44 45
PID    COMM         FUNC             -
4490   dd           sys_read         read 1048576 bytes
4490   dd           sys_read         read 1048576 bytes
4490   dd           sys_read         read 1048576 bytes
4490   dd           sys_read         read 1048576 bytes
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
46 47 48 49 50 51 52 53 54 55 56 57
^C

During the trace, I executed "dd if=/dev/zero of=/dev/null bs=1M count=4".
The individual reads are visible, with the custom format message printed for
each read. The parenthesized expression "(arg3 > 20000)" is a filter that is
evaluated for each invocation of the probe before printing anything.

You can also trace user functions. For example, let's simulate the bashreadline
script, which attaches to the readline function in bash and prints its return
value, effectively snooping all bash shell input across the system:

# trace 'r:bash:readline "%s", retval'
58 59 60
PID    COMM         FUNC             -
2740   bash         readline         echo hi!
2740   bash         readline         man ls
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
61 62 63 64 65 66 67 68 69
^C

The special retval keywords stands for the function's return value, and can
be used only in a retprobe, specified by the 'r' prefix. The next component
of the probe is the library that contains the desired function. It's OK to
specify executables too, as long as they can be found in the PATH. Or, you
can specify the full path to the executable (e.g. "/usr/bin/bash").

Multiple probes can be combined on the same command line. For example, let's
70
trace failed read and write calls on the libc level, and include a time column:
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
71 72

# trace 'r:c:read ((int)retval < 0) "read failed: %d", retval' \
73
        'r:c:write ((int)retval < 0) "write failed: %d", retval' -T
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
74 75 76 77 78 79 80 81 82
TIME     PID    COMM         FUNC             -
05:31:57 3388   bash         write            write failed: -1
05:32:00 3388   bash         write            write failed: -1
^C

Note that the retval variable must be cast to int before comparing to zero.
The reason is that the default type for argN and retval is an unsigned 64-bit
integer, which can never be smaller than 0.

83 84 85 86
trace has also some basic support for kernel tracepoints. For example, let's
trace the block:block_rq_complete tracepoint and print out the number of sectors
transferred:

87
# trace 't:block:block_rq_complete "sectors=%d", args->nr_sector' -T
88 89 90 91 92 93
TIME     PID    COMM         FUNC             -
01:23:51 0      swapper/0    block_rq_complete sectors=8
01:23:55 10017  kworker/u64: block_rq_complete sectors=1
01:23:55 0      swapper/0    block_rq_complete sectors=8
^C

94 95
To discover the tracepoint structure format (which you can refer to as the "args"
pointer variable), use the tplist tool. For example:
96 97 98 99 100 101 102 103 104

# tplist -v block:block_rq_complete
block:block_rq_complete
    dev_t dev;
    sector_t sector;
    unsigned int nr_sector;
    int errors;
    char rwbs[8];

105
This output tells you that you can use "args->dev", "args->sector", etc. in your
106 107
predicate and trace arguments.

108 109 110

More and more high-level libraries are instrumented with USDT probe support.
These probes can be traced by trace just like kernel tracepoints. For example,
111 112 113 114 115 116 117 118 119 120 121
trace new threads being created and their function name, include time column
and on which CPU it happened:

# trace 'u:pthread:pthread_create "%U", arg3' -T -C
TIME     CPU PID     TID     COMM            FUNC             -
13:22:01 25  2627    2629    automount       pthread_create   expire_proc_indirect+0x0 [automount]
13:22:01 5   21360   21414   osqueryd        pthread_create   [unknown] [osqueryd]
13:22:03 25  2627    2629    automount       pthread_create   expire_proc_indirect+0x0 [automount]
13:22:04 15  21360   21414   osqueryd        pthread_create   [unknown] [osqueryd]
13:22:07 25  2627    2629    automount       pthread_create   expire_proc_indirect+0x0 [automount]
13:22:07 4   21360   21414   osqueryd        pthread_create   [unknown] [osqueryd]
122 123 124 125 126 127 128 129 130
^C

The "%U" format specifier tells trace to resolve arg3 as a user-space symbol,
if possible. Similarly, use "%K" for kernel symbols.

Ruby, Node, and OpenJDK are also instrumented with USDT. For example, let's
trace Ruby methods being called (this requires a version of Ruby built with 
the --enable-dtrace configure flag):

131
# trace 'u:ruby:method__entry "%s.%s", arg1, arg2' -p $(pidof irb) -T
132 133 134 135 136 137 138 139 140
TIME     PID    COMM         FUNC             -
12:08:43 18420  irb          method__entry    IRB::Context.verbose?
12:08:43 18420  irb          method__entry    RubyLex.ungetc
12:08:43 18420  irb          method__entry    RuxyLex.debug?
^C

In the previous invocation, arg1 and arg2 are the class name and method name
for the Ruby method being invoked.

141 142 143 144 145 146 147 148 149 150
You can also trace exported functions from shared libraries, or an imported
function on the actual executable:

# sudo ./trace.py 'r:/usr/lib64/libtinfo.so:curses_version "Version=%s", retval'
# tput -V

PID    TID    COMM         FUNC             -
21720  21720  tput         curses_version   Version=ncurses 6.0.20160709
^C

151

152 153 154
Occasionally, it can be useful to filter specific strings. For example, you
might be interested in open() calls that open a specific file:

155
# trace 'p:c:open (STRCMP("test.txt", arg1)) "opening %s", arg1' -T
156 157 158 159 160 161
TIME     PID    COMM         FUNC             -
01:43:15 10938  cat          open             opening test.txt
01:43:20 10939  cat          open             opening test.txt
^C


162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
In the preceding example, as well as in many others, readability may be
improved by providing the function's signature, which names the arguments and
lets you access structure sub-fields, which is hard with the "arg1", "arg2"
convention. For example:

# trace 'p:c:open(char *filename) "opening %s", filename'
PID    TID    COMM         FUNC             -
17507  17507  cat          open             opening FAQ.txt
^C

# trace 'p::SyS_nanosleep(struct timespec *ts) "sleep for %lld ns", ts->tv_nsec'
PID    TID    COMM         FUNC             -
777    785    automount    SyS_nanosleep    sleep for 500000000 ns
777    785    automount    SyS_nanosleep    sleep for 500000000 ns
777    785    automount    SyS_nanosleep    sleep for 500000000 ns
777    785    automount    SyS_nanosleep    sleep for 500000000 ns
^C

180 181 182
Remember to use the -I argument include the appropriate header file. We didn't
need to do that here because `struct timespec` is used internally by the tool,
so it always includes this header file.
183

184 185

As a final example, let's trace open syscalls for a specific process. By
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
186 187
default, tracing is system-wide, but the -p switch overrides this:

188
# trace -p 2740 'do_sys_open "%s", arg2' -T
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206
TIME     PID    COMM         FUNC             -
05:36:16 15872  ls           do_sys_open      /etc/ld.so.cache
05:36:16 15872  ls           do_sys_open      /lib64/libselinux.so.1
05:36:16 15872  ls           do_sys_open      /lib64/libcap.so.2
05:36:16 15872  ls           do_sys_open      /lib64/libacl.so.1
05:36:16 15872  ls           do_sys_open      /lib64/libc.so.6
05:36:16 15872  ls           do_sys_open      /lib64/libpcre.so.1
05:36:16 15872  ls           do_sys_open      /lib64/libdl.so.2
05:36:16 15872  ls           do_sys_open      /lib64/libattr.so.1
05:36:16 15872  ls           do_sys_open      /lib64/libpthread.so.0
05:36:16 15872  ls           do_sys_open      /usr/lib/locale/locale-archive
05:36:16 15872  ls           do_sys_open      /home/vagrant
^C

In this example, we traced the "ls ~" command as it was opening its shared
libraries and then accessing the /home/vagrant directory listing.


207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224
Lastly, if a high-frequency event is traced you may overflow the perf ring
buffer. This shows as "Lost N samples":

# trace sys_open
5087   5087   pgrep        sys_open
5087   5087   pgrep        sys_open
5087   5087   pgrep        sys_open
5087   5087   pgrep        sys_open
5087   5087   pgrep        sys_open
Lost 764896 samples
Lost 764896 samples
Lost 764896 samples

The perf ring buffer size can be changed with -b. The unit is size per-CPU buffer
size and is measured in pages. The value must be a power of two and defaults to
64 pages.


Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
225 226
USAGE message:

227 228
usage: trace [-h] [-b BUFFER_PAGES] [-p PID] [-L TID] [-v] [-Z STRING_SIZE]
             [-S] [-M MAX_EVENTS] [-t] [-T] [-K] [-U] [-I header]
229
             probe [probe ...]
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
230 231 232 233 234 235 236 237

Attach to functions and print trace messages.

positional arguments:
  probe                 probe specifier (see examples)

optional arguments:
  -h, --help            show this help message and exit
238 239 240
  -b BUFFER_PAGES, --buffer-pages BUFFER_PAGES
                        number of pages to use for perf_events ring buffer
                        (default: 64)
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
241
  -p PID, --pid PID     id of the process to trace (optional)
242
  -L TID, --tid TID     id of the thread to trace (optional)
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
243 244 245 246 247 248
  -v, --verbose         print resulting BPF program code before executing
  -Z STRING_SIZE, --string-size STRING_SIZE
                        maximum size to read from strings
  -S, --include-self    do not filter trace's own pid from the trace
  -M MAX_EVENTS, --max-events MAX_EVENTS
                        number of events to print before quitting
249
  -t, --timestamp       print timestamp column (offset from trace start)
250
  -T, --time            print time column
251
  -C, --print_cpu       print CPU id 
252 253 254 255
  -K, --kernel-stack    output kernel stack trace
  -U, --user-stack      output user stack trace
  -I header, --include header
                        additional header files to include in the BPF program
256 257
                        as either full path, or relative to current working directory,
                        or relative to default kernel header search path
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
258 259 260

EXAMPLES:

261
trace do_sys_open
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
262
        Trace the open syscall and print a default trace message when entered
263
trace 'do_sys_open "%s", arg2'
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
264
        Trace the open syscall and print the filename being opened
265
trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
266
        Trace the read syscall and print a message for reads >20000 bytes
267
trace 'r::do_sys_open "%llx", retval'
268
        Trace the return from the open syscall and print the return value
269
trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
270
        Trace the open() call from libc only if the flags (arg2) argument is 42
271
trace 'c:malloc "size = %d", arg1'
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
272
        Trace malloc calls and print the size being allocated
273 274
trace 'p:c:write (arg1 == 1) "writing %d bytes to STDOUT", arg3'
        Trace the write() call from libc to monitor writes to STDOUT
275
trace 'r::__kmalloc (retval == 0) "kmalloc failed!"'
276
        Trace returns from __kmalloc which returned a null pointer
277
trace 'r:c:malloc (retval) "allocated = %x", retval'
Sasha Goldshtein's avatar
trace  
Sasha Goldshtein committed
278
        Trace returns from malloc and print non-NULL allocated buffers
279
trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
280
        Trace the block_rq_complete kernel tracepoint and print # of tx sectors
281 282
trace 'u:pthread:pthread_create (arg4 != 0)'
        Trace the USDT probe pthread_create when its 4th argument is non-zero
283 284
trace 'p::SyS_nanosleep(struct timespec *ts) "sleep for %lld ns", ts->tv_nsec'
        Trace the nanosleep syscall and print the sleep duration in ns
285 286 287 288 289 290 291 292 293 294
trace -I 'linux/fs.h' \
      'p::uprobe_register(struct inode *inode) "a_ops = %llx", inode->i_mapping->a_ops'
        Trace the uprobe_register inode mapping ops, and the symbol can be found
        in /proc/kallsyms
trace -I 'kernel/sched/sched.h' \
      'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq) "%d", cfs_rq->runtime_remaining'
        Trace the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined
        in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel
        package.  So this command needs to run at the kernel source tree root directory
        so that the added header file can be found by the compiler.
295 296 297 298 299
trace -I 'net/sock.h' \\
      'udpv6_sendmsg(struct sock *sk) (sk->sk_dport == 13568)'
        Trace udpv6 sendmsg calls only if socket's destination port is equal
        to 53 (DNS; 13568 in big endian order)
"