Commit 65c93628 authored by Martin KaFai Lau's avatar Martin KaFai Lau Committed by Daniel Borkmann

bpftool: Add struct_ops support

This patch adds struct_ops support to the bpftool.

To recap a bit on the recent bpf_struct_ops feature on the kernel side:
It currently supports "struct tcp_congestion_ops" to be implemented
in bpf.  At a high level, bpf_struct_ops is struct_ops map populated
with a number of bpf progs.  bpf_struct_ops currently supports the
"struct tcp_congestion_ops".  However, the bpf_struct_ops design is
generic enough that other kernel struct ops can be supported in
the future.

Although struct_ops is map+progs at a high lever, there are differences
in details.  For example,
1) After registering a struct_ops, the struct_ops is held by the kernel
   subsystem (e.g. tcp-cc).  Thus, there is no need to pin a
   struct_ops map or its progs in order to keep them around.
2) To iterate all struct_ops in a system, it iterates all maps
   in type BPF_MAP_TYPE_STRUCT_OPS.  BPF_MAP_TYPE_STRUCT_OPS is
   the current usual filter.  In the future, it may need to
   filter by other struct_ops specific properties.  e.g. filter by
   tcp_congestion_ops or other kernel subsystem ops in the future.
3) struct_ops requires the running kernel having BTF info.  That allows
   more flexibility in handling other kernel structs.  e.g. it can
   always dump the latest bpf_map_info.
4) Also, "struct_ops" command is not intended to repeat all features
   already provided by "map" or "prog".  For example, if there really
   is a need to pin the struct_ops map, the user can use the "map" cmd
   to do that.

While the first attempt was to reuse parts from map/prog.c,  it ended up
not a lot to share.  The only obvious item is the map_parse_fds() but
that still requires modifications to accommodate struct_ops map specific
filtering (for the immediate and the future needs).  Together with the
earlier mentioned differences, it is better to part away from map/prog.c.

The initial set of subcmds are, register, unregister, show, and dump.

For register, it registers all struct_ops maps that can be found in an
obj file.  Option can be added in the future to specify a particular
struct_ops map.  Also, the common bpf_tcp_cc is stateless (e.g.
bpf_cubic.c and bpf_dctcp.c).  The "reuse map" feature is not
implemented in this patch and it can be considered later also.

For other subcmds, please see the man doc for details.

A sample output of dump:
[root@arch-fb-vm1 bpf]# bpftool struct_ops dump name cubic
[{
        "bpf_map_info": {
            "type": 26,
            "id": 64,
            "key_size": 4,
            "value_size": 256,
            "max_entries": 1,
            "map_flags": 0,
            "name": "cubic",
            "ifindex": 0,
            "btf_vmlinux_value_type_id": 18452,
            "netns_dev": 0,
            "netns_ino": 0,
            "btf_id": 52,
            "btf_key_type_id": 0,
            "btf_value_type_id": 0
        }
    },{
        "bpf_struct_ops_tcp_congestion_ops": {
            "refcnt": {
                "refs": {
                    "counter": 1
                }
            },
            "state": "BPF_STRUCT_OPS_STATE_INUSE",
            "data": {
                "list": {
                    "next": 0,
                    "prev": 0
                },
                "key": 0,
                "flags": 0,
                "init": "void (struct sock *) bictcp_init/prog_id:138",
                "release": "void (struct sock *) 0",
                "ssthresh": "u32 (struct sock *) bictcp_recalc_ssthresh/prog_id:141",
                "cong_avoid": "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140",
                "set_state": "void (struct sock *, u8) bictcp_state/prog_id:142",
                "cwnd_event": "void (struct sock *, enum tcp_ca_event) bictcp_cwnd_event/prog_id:139",
                "in_ack_event": "void (struct sock *, u32) 0",
                "undo_cwnd": "u32 (struct sock *) tcp_reno_undo_cwnd/prog_id:144",
                "pkts_acked": "void (struct sock *, const struct ack_sample *) bictcp_acked/prog_id:143",
                "min_tso_segs": "u32 (struct sock *) 0",
                "sndbuf_expand": "u32 (struct sock *) 0",
                "cong_control": "void (struct sock *, const struct rate_sample *) 0",
                "get_info": "size_t (struct sock *, u32, int *, union tcp_cc_info *) 0",
                "name": "bpf_cubic",
                "owner": 0
            }
        }
    }
]
Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200318171656.129650-1-kafai@fb.com
parent d5ae04da
==================
bpftool-struct_ops
==================
-------------------------------------------------------------------------------
tool to register/unregister/introspect BPF struct_ops
-------------------------------------------------------------------------------
:Manual section: 8
SYNOPSIS
========
**bpftool** [*OPTIONS*] **struct_ops** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] }
*COMMANDS* :=
{ **show** | **list** | **dump** | **register** | **unregister** | **help** }
STRUCT_OPS COMMANDS
===================
| **bpftool** **struct_ops { show | list }** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops dump** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops register** *OBJ*
| **bpftool** **struct_ops unregister** *STRUCT_OPS_MAP*
| **bpftool** **struct_ops help**
|
| *STRUCT_OPS_MAP* := { **id** *STRUCT_OPS_MAP_ID* | **name** *STRUCT_OPS_MAP_NAME* }
| *OBJ* := /a/file/of/bpf_struct_ops.o
DESCRIPTION
===========
**bpftool struct_ops { show | list }** [*STRUCT_OPS_MAP*]
Show brief information about the struct_ops in the system.
If *STRUCT_OPS_MAP* is specified, it shows information only
for the given struct_ops. Otherwise, it lists all struct_ops
currently existing in the system.
Output will start with struct_ops map ID, followed by its map
name and its struct_ops's kernel type.
**bpftool struct_ops dump** [*STRUCT_OPS_MAP*]
Dump details information about the struct_ops in the system.
If *STRUCT_OPS_MAP* is specified, it dumps information only
for the given struct_ops. Otherwise, it dumps all struct_ops
currently existing in the system.
**bpftool struct_ops register** *OBJ*
Register bpf struct_ops from *OBJ*. All struct_ops under
the ELF section ".struct_ops" will be registered to
its kernel subsystem.
**bpftool struct_ops unregister** *STRUCT_OPS_MAP*
Unregister the *STRUCT_OPS_MAP* from the kernel subsystem.
**bpftool struct_ops help**
Print short help message.
OPTIONS
=======
-h, --help
Print short generic help message (similar to **bpftool help**).
-V, --version
Print version number (similar to **bpftool version**).
-j, --json
Generate JSON output. For commands that cannot produce JSON, this
option has no effect.
-p, --pretty
Generate human-readable JSON output. Implies **-j**.
-d, --debug
Print all logs available, even debug-level information. This
includes logs from libbpf as well as from the verifier, when
attempting to load programs.
EXAMPLES
========
**# bpftool struct_ops show**
::
100: dctcp tcp_congestion_ops
105: cubic tcp_congestion_ops
**# bpftool struct_ops unregister id 105**
::
Unregistered tcp_congestion_ops cubic id 105
**# bpftool struct_ops register bpf_cubic.o**
::
Registered tcp_congestion_ops cubic id 110
SEE ALSO
========
**bpf**\ (2),
**bpf-helpers**\ (7),
**bpftool**\ (8),
**bpftool-prog**\ (8),
**bpftool-map**\ (8),
**bpftool-cgroup**\ (8),
**bpftool-feature**\ (8),
**bpftool-net**\ (8),
**bpftool-perf**\ (8),
**bpftool-btf**\ (8)
**bpftool-gen**\ (8)
...@@ -576,6 +576,34 @@ _bpftool() ...@@ -576,6 +576,34 @@ _bpftool()
;; ;;
esac esac
;; ;;
struct_ops)
local STRUCT_OPS_TYPE='id name'
case $command in
show|list|dump|unregister)
case $prev in
$command)
COMPREPLY=( $( compgen -W "$STRUCT_OPS_TYPE" -- "$cur" ) )
;;
id)
_bpftool_get_map_ids_for_type struct_ops
;;
name)
_bpftool_get_map_names_for_type struct_ops
;;
esac
return 0
;;
register)
_filedir
return 0
;;
*)
[[ $prev == $object ]] && \
COMPREPLY=( $( compgen -W 'register unregister show list dump help' \
-- "$cur" ) )
;;
esac
;;
map) map)
local MAP_TYPE='id pinned name' local MAP_TYPE='id pinned name'
case $command in case $command in
......
...@@ -58,7 +58,7 @@ static int do_help(int argc, char **argv) ...@@ -58,7 +58,7 @@ static int do_help(int argc, char **argv)
" %s batch file FILE\n" " %s batch file FILE\n"
" %s version\n" " %s version\n"
"\n" "\n"
" OBJECT := { prog | map | cgroup | perf | net | feature | btf | gen }\n" " OBJECT := { prog | map | cgroup | perf | net | feature | btf | gen | struct_ops }\n"
" " HELP_SPEC_OPTIONS "\n" " " HELP_SPEC_OPTIONS "\n"
"", "",
bin_name, bin_name, bin_name); bin_name, bin_name, bin_name);
...@@ -221,6 +221,7 @@ static const struct cmd cmds[] = { ...@@ -221,6 +221,7 @@ static const struct cmd cmds[] = {
{ "feature", do_feature }, { "feature", do_feature },
{ "btf", do_btf }, { "btf", do_btf },
{ "gen", do_gen }, { "gen", do_gen },
{ "struct_ops", do_struct_ops },
{ "version", do_version }, { "version", do_version },
{ 0 } { 0 }
}; };
......
...@@ -161,6 +161,7 @@ int do_tracelog(int argc, char **arg); ...@@ -161,6 +161,7 @@ int do_tracelog(int argc, char **arg);
int do_feature(int argc, char **argv); int do_feature(int argc, char **argv);
int do_btf(int argc, char **argv); int do_btf(int argc, char **argv);
int do_gen(int argc, char **argv); int do_gen(int argc, char **argv);
int do_struct_ops(int argc, char **argv);
int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what); int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what);
int prog_parse_fd(int *argc, char ***argv); int prog_parse_fd(int *argc, char ***argv);
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment