Demonstrations of stackcount, the Linux eBPF/bcc version.


This program traces kernel functions and frequency counts them with their entire
kernel stack trace, summarized in-kernel for efficiency. For example, counting
stack traces that led to submit_bio(), which creates block device I/O:

# ./stackcount submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
  submit_bio
  submit_bh
  journal_submit_commit_record.isra.13
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
  mb_cache_list
    1

  submit_bio
  __block_write_full_page.constprop.39
  block_write_full_page
  blkdev_writepage
  __writepage
  write_cache_pages
  generic_writepages
  do_writepages
  __writeback_single_inode
  writeback_sb_inodes
  __writeback_inodes_wb
    2

  submit_bio
  __block_write_full_page.constprop.39
  block_write_full_page
  blkdev_writepage
  __writepage
  write_cache_pages
  generic_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_fdatawrite
  fdatawrite_one_bdev
    36

  submit_bio
  submit_bh
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
  mb_cache_list
    38

  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_rename
  ext4_rename2
  vfs_rename
  sys_rename
  entry_SYSCALL_64_fastpath
    79

Detaching...

The output shows unique stack traces, in order from leaf (on-CPU) to root,
followed by their occurrence count. The last stack trace in the above output
shows syscall handling, ext4_rename(), and filemap_flush(): looks like an
application issued file rename has caused back end disk I/O due to ext4
block allocation and a filemap_flush(). I'd have to browse the code to those
functions to confirm!

The order of printed stack traces is from least to most frequent. The most
frequent in this case, the ext4_rename() stack, was taken 79 times during
tracing.

It can be useful to trace the path to submit_bio to explain unusual rates of
disk IOPS. These could have in-kernel origins (eg, background scrub).


As another example, here are the code paths that led to ip_output(), which
sends a packet at the IP level:

# ./stackcount ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  tcp_tsq_handler.part.32
  tcp_tasklet_func
  tasklet_action
  __do_softirq
  do_softirq_own_stack
  do_softirq
  __local_bh_enable_ip
    1

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_send_ack
  tcp_send_delayed_ack
  __tcp_ack_snd_check
  tcp_rcv_established
  tcp_v4_do_rcv
  tcp_v4_rcv
  ip_local_deliver_finish
  ip_local_deliver
    1

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_send_ack
  tcp_send_delayed_ack
  __tcp_ack_snd_check
  tcp_rcv_established
  tcp_v4_do_rcv
  tcp_v4_rcv
  ip_local_deliver_finish
  ip_local_deliver
    1

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_send_ack
  tcp_delack_timer_handler
  tcp_delack_timer
  call_timer_fn
  run_timer_softirq
  __do_softirq
  irq_exit
  xen_evtchn_do_upcall
    1

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_rcv_established
  tcp_v4_do_rcv
  release_sock
  tcp_sendmsg
  inet_sendmsg
  sock_sendmsg
    3

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  tcp_tsq_handler.part.32
  tcp_tasklet_func
  tasklet_action
  __do_softirq
  run_ksoftirqd
  smpboot_thread_fn
  kthread
    3

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_rcv_established
  tcp_v4_do_rcv
  tcp_v4_rcv
  ip_local_deliver_finish
  ip_local_deliver
  ip_rcv_finish
    4

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_send_ack
  tcp_cleanup_rbuf
  tcp_recvmsg
  inet_recvmsg
  sock_recvmsg
  sock_read_iter
  __vfs_read
  vfs_read
    5

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_rcv_established
  tcp_v4_do_rcv
  release_sock
  tcp_sendmsg
  inet_sendmsg
  sock_sendmsg
    9

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_push
  tcp_sendmsg
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  __vfs_write
    51

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_rcv_established
  tcp_v4_do_rcv
  tcp_v4_rcv
  ip_local_deliver_finish
  ip_local_deliver
  ip_rcv_finish
    171

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_rcv_established
  tcp_v4_do_rcv
  tcp_v4_rcv
  ip_local_deliver_finish
  ip_local_deliver
  ip_rcv_finish
    994

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  tcp_tsq_handler.part.32
  tcp_tasklet_func
  tasklet_action
  __do_softirq
  irq_exit
  xen_evtchn_do_upcall
  xen_do_hypervisor_callback
    1002

Detaching...

The last two most frequent stack traces are via tcp_transmit_skb(). Note the
send last begins with ip_rcv_finish(), for a local receive, which then becomes
a transmit: likely pushing more frames when processing the newly received ones.

As may be obvious, this is a great tool for quickly understanding kernel code
flow.


A -i option can be used to set an output interval, and -T to include a
timestamp. For example:

# ./stackcount -Ti 1 submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.

01:11:37
  submit_bio
  submit_bh
  journal_submit_commit_record.isra.13
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
  mb_cache_list
    1

  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_rename
  ext4_rename2
  vfs_rename
  sys_rename
  entry_SYSCALL_64_fastpath
    20

  submit_bio
  submit_bh
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
  mb_cache_list
    39


01:11:38
  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_rename
  ext4_rename2
  vfs_rename
  sys_rename
  entry_SYSCALL_64_fastpath
    20


01:11:39
  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_rename
  ext4_rename2
  vfs_rename
  sys_rename
  entry_SYSCALL_64_fastpath
    20

^C
01:11:39
  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_rename
  ext4_rename2
  vfs_rename
  sys_rename
  entry_SYSCALL_64_fastpath
    20

Detaching...


The -s output prints the return instruction offset for each function (aka
symbol offset). Eg:

# ./stackcount -s tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
  tcp_sendmsg0x1
  sock_sendmsg0x38
  sock_write_iter0x78
  __vfs_write0xaa
  vfs_write0xa9
  sys_write0x46
  entry_SYSCALL_64_fastpath0x16
    29

Detaching...

If it wasn't clear how one function called another, knowing the instruction
offset can help you locate the lines of code from a disassembly dump.


A wildcard can also be used. Eg, all functions beginning with "tcp_send":

# ./stackcount -s 'tcp_send*'
Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end.
^C
  tcp_send_delayed_ack0x1
  tcp_rcv_established0x3b1
  tcp_v4_do_rcv0x130
  tcp_v4_rcv0x8e0
  ip_local_deliver_finish0x9f
  ip_local_deliver0x51
  ip_rcv_finish0x8a
  ip_rcv0x29d
  __netif_receive_skb_core0x637
  __netif_receive_skb0x18
  netif_receive_skb_internal0x23
    1

  tcp_send_delayed_ack0x1
  tcp_rcv_established0x222
  tcp_v4_do_rcv0x130
  tcp_v4_rcv0x8e0
  ip_local_deliver_finish0x9f
  ip_local_deliver0x51
  ip_rcv_finish0x8a
  ip_rcv0x29d
  __netif_receive_skb_core0x637
  __netif_receive_skb0x18
  netif_receive_skb_internal0x23
    4

  tcp_send_mss0x1
  inet_sendmsg0x67
  sock_sendmsg0x38
  sock_write_iter0x78
  __vfs_write0xaa
  vfs_write0xa9
  sys_write0x46
  entry_SYSCALL_64_fastpath0x16
    7

  tcp_sendmsg0x1
  sock_sendmsg0x38
  sock_write_iter0x78
  __vfs_write0xaa
  vfs_write0xa9
  sys_write0x46
  entry_SYSCALL_64_fastpath0x16
    7

Detaching...

Use -r to allow regular expressions.


USAGE message:

# ./stackcount -h
usage: stackcount [-h] [-p PID] [-i INTERVAL] [-T] [-r] [-s] [-v] pattern

Count kernel function calls and their stack traces

positional arguments:
  pattern               search expression for kernel functions

optional arguments:
  -h, --help            show this help message and exit
  -p PID, --pid PID     trace this PID only
  -i INTERVAL, --interval INTERVAL
                        summary interval, seconds
  -T, --timestamp       include timestamp on output
  -r, --regexp          use regular expressions. Default is "*" wildcards
                        only.
  -s, --offset          show address offsets
  -v, --verbose         show raw addresses

examples:
    ./stackcount submit_bio       # count kernel stack traces for submit_bio
    ./stackcount ip_output        # count kernel stack traces for ip_output
    ./stackcount -s ip_output     # show symbol offsets
    ./stackcount -sv ip_output    # show offsets and raw addresses (verbose)
    ./stackcount 'tcp_send*'      # count stacks for funcs matching tcp_send*
    ./stackcount -r '^tcp_send.*' # same as above, using regular expressions
    ./stackcount -Ti 5 ip_output  # output every 5 seconds, with timestamps
    ./stackcount -p 185 ip_output # count ip_output stacks for PID 185 only