Commits · 51199405f967207de372d9b60989eb87d7ae8809 · nexedi / linux

20 Dec, 2018 14 commits

bpf: skb_verdict, support SK_PASS on RX BPF path · 51199405

John Fastabend authored Dec 20, 2018

Add SK_PASS verdict support to SK_SKB_VERDICT programs. Now that
support for redirects exists we can implement SK_PASS as a redirect
to the same socket. This simplifies the BPF programs and avoids an
extra map lookup on RX path for simple visibility cases.

Further, reduces user (BPF programmer in this context) confusion
when their program drops skb due to lack of support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

51199405

bpf: skmsg, replace comments with BUILD bug · 7a69c0f2

John Fastabend authored Dec 20, 2018

Enforce comment on structure layout dependency with a BUILD_BUG_ON
to ensure the condition is maintained.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7a69c0f2

bpf: sk_msg, improve offset chk in _is_valid_access · bc1b4f01

John Fastabend authored Dec 20, 2018

The check for max offset in sk_msg_is_valid_access uses sizeof()
which is incorrect because it would allow accessing possibly
past the end of the struct in the padded case. Further, it doesn't
preclude accessing any padding that may be added in the middle of
a struct. All told this makes it fragile to rely on.

To fix this explicitly check offsets with fields using the
bpf_ctx_range() and bpf_ctx_range_till() macros.

For reference the current structure layout looks as follows (reported
by pahole)

struct sk_msg_md {
	union {
		void *             data;                 /*           8 */
	};                                               /*     0     8 */
	union {
		void *             data_end;             /*           8 */
	};                                               /*     8     8 */
	__u32                      family;               /*    16     4 */
	__u32                      remote_ip4;           /*    20     4 */
	__u32                      local_ip4;            /*    24     4 */
	__u32                      remote_ip6[4];        /*    28    16 */
	__u32                      local_ip6[4];         /*    44    16 */
	__u32                      remote_port;          /*    60     4 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	__u32                      local_port;           /*    64     4 */
	__u32                      size;                 /*    68     4 */

	/* size: 72, cachelines: 2, members: 10 */
	/* last cacheline: 8 bytes */
};

So there should be no padding at the moment but fixing this now
prevents future errors.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bc1b4f01

bpf: sk_msg, fix sk_msg_md access past end test · 9ee79a65

John Fastabend authored Dec 20, 2018

Currently, the test to ensure reads past the end of the sk_msg_md
data structure fail is incorrectly expecting success. Fix this
typo and use correct expected error.

Fixes: 945a47d8 ("bpf: sk_msg, add tests for size field")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9ee79a65

bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't · 77ea5f4c

Jesper Dangaard Brouer authored Dec 19, 2018

The frame_size passed to build_skb must be aligned, else it is
possible that the embedded struct skb_shared_info gets unaligned.

For correctness make sure that xdpf->headroom in included in the
alignment. No upstream drivers can hit this, as all XDP drivers provide
an aligned headroom. This was discovered when playing with implementing
XDP support for mvneta, which have a 2 bytes DSA header, and this
Marvell ARM64 platform didn't like doing atomic operations on an
unaligned skb_shinfo(skb)->dataref addresses.

Fixes: 1c601d82 ("bpf: cpumap xdp_buff to skb conversion and allocation")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

77ea5f4c

Merge branch 'bpf-jset-verifier' · d70f4ece

Daniel Borkmann authored Dec 20, 2018

Jakub Kicinski says:

====================
This is a v2 of the patch set to teach the verifier about BPF_JSET
instruction.  There is also a number of tests include for both
basic functioning of the instruction and the verifier logic.
The NFP JIT handling of JSET is tweaked.  Last patch adds missing
file to gitignore.

Reposting part of previous series without the dead code elimination.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d70f4ece

selftests: bpf: add missing executables to .gitignore · 489c066c

Jakub Kicinski authored Dec 19, 2018

commit 435f90a3 ("selftests/bpf: add a test case for sock_ops
perf-event notification") missed adding new test to gitignore.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

489c066c

nfp: bpf: optimize codegen for JSET with a constant · 4987eacc

Jakub Kicinski authored Dec 19, 2018

The top word of the constant can only have bits set if sign
extension set it to all-1, therefore we don't really have to
mask the top half of the register.  We can just OR it into
the result as is.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

4987eacc

nfp: bpf: remove the trivial JSET optimization · 6e774845

Jakub Kicinski authored Dec 19, 2018

The verifier will now understand the JSET instruction, so don't
mark the dead branch in the JIT as noop.  We won't generate any
code, anyway.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

6e774845

bpf: verifier: reorder stack size check with dead code sanitization · 9b38c405

Jakub Kicinski authored Dec 19, 2018

Reorder the calls to check_max_stack_depth() and sanitize_dead_code()
to separate functions which can rewrite instructions from pure checks.

No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9b38c405

selftests: bpf: verifier: add tests for JSET interpretation · 14507e35

Jakub Kicinski authored Dec 19, 2018

Validate that the verifier reasons correctly about the bounds
and removes dead code based on results of JSET instruction.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

14507e35

bpf: verifier: teach the verifier to reason about the BPF_JSET instruction · 960ea056

Jakub Kicinski authored Dec 19, 2018

Some JITs (nfp) try to optimize code on their own.  It could make
sense in case of BPF_JSET instruction which is currently not interpreted
by the verifier, meaning for instance that dead could would not be
detected if it was under BPF_JSET branch.

Teach the verifier basics of BPF_JSET, JIT optimizations will be
removed shortly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

960ea056

selftests: bpf: add trivial JSET tests · 5a8d5209

Jakub Kicinski authored Dec 19, 2018

We seem to have no JSET instruction test, and LLVM does not
generate it at all, so let's add a simple hand-coded test
to make sure JIT implementations are correct.

v2:
 - extend test_verifier to handle multiple inputs and
   add the sample there (Daniel)
 - add a sign extension case
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5a8d5209

bpf: sparc64: Enable sparc64 jit to provide bpf_line_info · 9df95e8e

Martin KaFai Lau authored Dec 19, 2018

This patch enables sparc64's bpf_int_jit_compile() to provide
bpf_line_info by calling bpf_prog_fill_jited_linfo().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9df95e8e

19 Dec, 2018 5 commits

Merge branch 'line_info-check-for-ld_imm64' · 6f1f78ef

Alexei Starovoitov authored Dec 19, 2018

Martin KaFai Lau says:

====================
This series ensures the line_info (passed by the userspace during
bpf_prog_load) cannot have its line_info.insn_off pointing to a
zero bpf insn code.  F.e. a broken userspace tool might
generate a line_info.insn_off that points to the second
8 bytes of a BPF_LD_IMM64.

The first patch is the kernel change.
The second patch is a new test case.
====================
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6f1f78ef

bpf: Add BPF_LD_IMM64 to the line_info test · e30f5640

Martin KaFai Lau authored Dec 19, 2018

This patch adds a BPF_LD_IMM64 case to the line_info test
to ensure the kernel rejects linfo_info.insn_off pointing
to the 2nd 8 bytes of the BPF_LD_IMM64.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e30f5640

bpf: Ensure line_info.insn_off cannot point to insn with zero code · fdbaa0be

Martin KaFai Lau authored Dec 19, 2018

This patch rejects a line_info if the bpf insn code referred by
line_info.insn_off is 0. F.e. a broken userspace tool might generate
a line_info.insn_off that points to the second 8 bytes of a BPF_LD_IMM64.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fdbaa0be

tools: bpftool: do not force gcc as CC · 9e88b931

Ivan Babrou authored Dec 19, 2018

This allows transparent cross-compilation with CROSS_COMPILE by
relying on 7ed1c190 ("tools: fix cross-compile var clobbering").
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9e88b931

xsk: simplify AF_XDP socket teardown · e2ce3674

Björn Töpel authored Dec 19, 2018

Prior this commit, when the struct socket object was being released,
the UMEM did not have its reference count decreased. Instead, this was
done in the struct sock sk_destruct function.

There is no reason to keep the UMEM reference around when the socket
is being orphaned, so in this patch the xdp_put_mem is called in the
xsk_release function. This results in that the xsk_destruct function
can be removed!

Note that, it still holds that a struct xsk_sock reference might still
linger in the XSKMAP after the UMEM is released, e.g. if a user does
not clear the XSKMAP prior to closing the process. This sock will be
in a "released" zombie like state, until the XSKMAP is removed.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e2ce3674

18 Dec, 2018 21 commits

bpf: log struct/union attribute for forward type · 76c43ae8

Yonghong Song authored Dec 18, 2018

Current btf internal verbose logger logs fwd type as
  [2] FWD A type_id=0
where A is the type name.

Commit 9d5f9f70 ("bpf: btf: fix struct/union/fwd types
with kind_flag") introduced kind_flag which can be used
to distinguish whether a forward type is a struct or
union.

Also, "type_id=0" does not carry any meaningful
information for fwd type as btf_type.type = 0 is simply
enforced during btf verification and is not used
anywhere else.

This commit changed the log to
  [2] FWD A struct
if kind_flag = 0, or
  [2] FWD A union
if kind_flag = 1.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

76c43ae8

Merge branch 'bpf-sk-msg-size-member' · dd4bfda9

Daniel Borkmann authored Dec 19, 2018

John Fastabend says:

====================
This adds a size field to the sk_msg_md data structure used by SK_MSG
programs. Without this in the zerocopy case and in the copy case
where multiple iovs are in use its difficult to know how much data
can be pulled in. The normal method of reading data and data_end
only give the current contiguous buffer. BPF programs can attempt to
pull in extra data but have to guess if it exists. This can result
in multiple "guesses" its much better if we know upfront the size
of the sk_msg.
====================
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

dd4bfda9

bpf: sk_msg, add tests for size field · 945a47d8

John Fastabend authored Dec 16, 2018

This adds tests to read the size field to test_verifier.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

945a47d8

bpf: add tools lib/include support sk_msg_md size field · 584e4681

John Fastabend authored Dec 16, 2018

Add the size field to sk_msg_md for tools.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

584e4681

bpf: sockmap, metadata support for reporting size of msg · 3bdbd022

John Fastabend authored Dec 16, 2018

This adds metadata to sk_msg_md for BPF programs to read the sk_msg
size.

When the SK_MSG program is running under an application that is using
sendfile the data is not copied into sk_msg buffers by default. Rather
the BPF program uses sk_msg_pull_data to read the bytes in. This
avoids doing the costly memcopy instructions when they are not in
fact needed. However, if we don't know the size of the sk_msg we
have to guess if needed bytes are available by doing a pull request
which may fail. By including the size of the sk_msg BPF programs can
check the size before issuing sk_msg_pull_data requests.

Additionally, the same applies for sendmsg calls when the application
provides multiple iovs. Here the BPF program needs to pull in data
to update data pointers but its not clear where the data ends without
a size parameter. In many cases "guessing" is not easy to do
and results in multiple calls to pull and without bounded loops
everything gets fairly tricky.

Clean this up by including a u32 size field. Note, all writes into
sk_msg_md are rejected already from sk_msg_is_valid_access so nothing
additional is needed there.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3bdbd022

bpf: correct slot_type marking logic to allow more stack slot sharing · 0bae2d4d

Jiong Wang authored Dec 15, 2018

Verifier is supposed to support sharing stack slot allocated to ptr with
SCALAR_VALUE for privileged program. However this doesn't happen for some
cases.

The reason is verifier is not clearing slot_type STACK_SPILL for all bytes,
it only clears part of them, while verifier is using:

  slot_type[0] == STACK_SPILL

as a convention to check one slot is ptr type.

So, the consequence of partial clearing slot_type is verifier could treat a
partially overridden ptr slot, which should now be a SCALAR_VALUE slot,
still as ptr slot, and rejects some valid programs.

Before this patch, test_xdp_noinline.o under bpf selftests, bpf_lxc.o and
bpf_netdev.o under Cilium bpf repo, when built with -mattr=+alu32 are
rejected due to this issue. After this patch, they all accepted.

There is no processed insn number change before and after this patch on
Cilium bpf programs.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0bae2d4d

bpf: support raw tracepoints in modules · a38d1107

Matt Mullins authored Dec 12, 2018

Distributions build drivers as modules, including network and filesystem
drivers which export numerous tracepoints.  This enables
bpf(BPF_RAW_TRACEPOINT_OPEN) to attach to those tracepoints.
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a38d1107

Merge branch 'bpf-bpftool-mount-tracefs' · a137401d

Daniel Borkmann authored Dec 18, 2018

Quentin Monnet says:

====================
This series focus on mounting (or not mounting) tracefs with bpftool.

First patch makes bpftool attempt to mount tracefs if tracefs is not
found when running "bpftool prog tracelog".

Second patch adds an option to bpftool to prevent it from attempting
to mount any file system (tracefs or bpffs), in case this behaviour
is undesirable for some users.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a137401d

tools: bpftool: add an option to prevent auto-mount of bpffs, tracefs · 33221307

Quentin Monnet authored Dec 18, 2018

In order to make life easier for users, bpftool automatically attempts
to mount the BPF virtual file system, if it is not mounted already,
before trying to pin objects in it. Similarly, it attempts to mount
tracefs if necessary before trying to dump the trace pipe to the
console.

While mounting file systems on-the-fly can improve user experience, some
administrators might prefer to avoid that. Let's add an option to block
these mount attempts. Note that it does not prevent automatic mounting
of tracefs by debugfs for the "bpftool prog tracelog" command.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

33221307

tools: bpftool: attempt to mount tracefs if required for tracelog cmd · be3245e2

Quentin Monnet authored Dec 18, 2018

As a follow-up to commit 30da46b5 ("tools: bpftool: add a command to
dump the trace pipe"), attempt to mount the tracefs virtual file system
if it is not detected on the system before trying to dump content of the
tracing pipe on an invocation of "bpftool prog tracelog".

Usually, tracefs in automatically mounted by debugfs when the user tries
to access it (e.g. "ls /sys/kernel/debug/tracing" mounts the tracefs).
So if we failed to find it, it is probably that debugfs is not here
either. Therefore, we just attempt a single mount, at a location that
does not involve debugfs: /sys/kernel/tracing.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

be3245e2

tools/bpf: check precise {func, line, jited_line}_info_rec_size in test_btf · 0d7410ea

Yonghong Song authored Dec 17, 2018

Current btf func_info, line_info and jited_line are designed to be
extensible. The record sizes for {func,line}_info are passed to kernel,
and the record sizes for {func,line,jited_line}_info are returned to
userspace during bpf_prog_info query.

In bpf selftests test_btf.c, when testing whether kernel returns
a legitimate {func,line, jited_line)_info rec_size, the test only
compares to the minimum allowed size. If the returned rec_size is smaller
than the minimum allowed size, it is considered incorrect.
The minimum allowed size for these three info sizes are equal to
current value of sizeof(struct bpf_func_info), sizeof(struct bpf_line_info)
and sizeof(__u64).

The original thinking was that in the future when rec_size is increased
in kernel, the same test should run correctly. But this sacrificed
the precision of testing under the very kernel the test is shipped with,
and bpf selftest is typically run with the same repo kernel.

So this patch changed the testing of rec_size such that the
kernel returned value should be equal to the size defined by
tools uapi header bpf.h which syncs with kernel uapi header.

Martin discovered a bug in one of rec_size comparisons.
Instead of comparing to minimum func_info rec_size 8, it compares to 4.
This patch fixed that issue as well.

Fixes: 999d82cb ("tools/bpf: enhance test_btf file testing to test func info")
Fixes: 05687352 ("bpf: Refactor and bug fix in test_func_type in test_btf.c")
Fixes: 4d6304c7 ("bpf: Add unit tests for bpf_line_info")
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

0d7410ea

bpf: libbpf: fix memleak by freeing line_info · 07a09d1b

Prashant Bhole authored Dec 17, 2018

This patch fixes a memory leak in libbpf by freeing up line_info
member of struct bpf_program while unloading a program.

Fixes: 3d650141 ("bpf: libbpf: Add btf_line_info support to libbpf")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

07a09d1b

Merge branch 'bpf-btf-type-fixes' · 37c7b1ca

Daniel Borkmann authored Dec 18, 2018

Yonghong Song says:

====================
Commit 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
introduced BTF, a debug info format for BTF.

The original design has a couple of issues though.
First, the bitfield size is only encoded in int type.
If the struct member bitfield type is enum, pahole ([1])
or llvm is forced to replace enum with int type. As a result, the original
type information gets lost.

Second, the original BTF design does not envision the possibility of
BTF=>header_file conversion ([2]), hence does not encode "struct" or
"union" info for a forward type. Such information is necessary to
convert BTF to a header file.

This patch set fixed the issue by introducing kind_flag, using one bit
in type->info. When kind_flag, the struct/union btf_member->offset
will encode both bitfield_size and bit_offset, covering both
int and enum base types. The kind_flag is also used to indicate whether
the forward type is a union (when set) or a struct.

Patch #1 refactors function btf_int_bits_seq_show() so Patch #2
can reuse part of the function.
Patch #2 implemented kind_flag support for struct/union/fwd types.
Patch #3 added kind_flag support for cgroup local storage map pretty print.
Patch #4 syncs kernel uapi btf.h to tools directory.
Patch #5 added unit tests for kind_flag.
Patch #6 added tests for kernel bpffs based pretty print with kind_flag.
Patch #7 refactors function btf_dumper_int_bits() so Patch #8
can reuse part of the function.
Patch #8 added bpftool support of pretty print with kind_flag set.

  [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=b18354f64cc215368c3bc0df4a7e5341c55c378c
  [2] https://lwn.net/SubscriberLink/773198/fe3074838f5c3f26/

Change logs:
  v2 -> v3:
    . Relocated comments about bitfield_size/bit_offset interpretation
      of the "offset" field right before the "offset" struct member.
    . Added missing byte alignment checking for non-bitfield enum
      member of a struct with kind_flag set.
    . Added two test cases in unit tests for struct type, kind_flag set,
      non-bitfield int/enum member, not-byte aligned bit offsets.
    . Added comments to help understand there is no overflow for
      total_bits_offset in bpftool function btf_dumper_int_bits().
    . Added explanation of typedef type dumping fix in Patch #8 commit
      message.

  v1 -> v2:
    . If kind_flag is set for a structure, ensure an int member,
      whether it is a bitfield or not, is a regular int type.
    . Added support so cgroup local storage map pretty print
      works with kind_flag.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

37c7b1ca

tools: bpftool: support pretty print with kind_flag set · 8772c8bc

Yonghong Song authored Dec 15, 2018

The following example shows map pretty print with structures
which include bitfield members.

  enum A { A1, A2, A3, A4, A5 };
  typedef enum A ___A;
  struct tmp_t {
       char a1:4;
       int  a2:4;
       int  :4;
       __u32 a3:4;
       int b;
       ___A b1:4;
       enum A b2:4;
  };
  struct bpf_map_def SEC("maps") tmpmap = {
       .type = BPF_MAP_TYPE_ARRAY,
       .key_size = sizeof(__u32),
       .value_size = sizeof(struct tmp_t),
       .max_entries = 1,
  };
  BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t);

and the following map update in the bpf program:

  key = 0;
  struct tmp_t t = {};
  t.a1 = 2;
  t.a2 = 4;
  t.a3 = 6;
  t.b = 7;
  t.b1 = 8;
  t.b2 = 10;
  bpf_map_update_elem(&tmpmap, &key, &t, 0);

With this patch, I am able to print out the map values
correctly with this patch:
bpftool map dump id 187
  [{
        "key": 0,
        "value": {
            "a1": 0x2,
            "a2": 0x4,
            "a3": 0x6,
            "b": 7,
            "b1": 0x8,
            "b2": 0xa
        }
    }
  ]

Previously, if a function prototype argument has a typedef
type, the prototype is not printed since
function __btf_dumper_type_only() bailed out with error
if the type is a typedef. This commit corrected this
behavior by printing out typedef properly.

The following example shows forward type and
typedef type can be properly printed in function prototype
with modified test_btf_haskv.c.

  struct t;
  union  u;

  __attribute__((noinline))
  static int test_long_fname_1(struct dummy_tracepoint_args *arg,
                               struct t *p1, union u *p2,
                               __u32 unused)
  ...
  int _dummy_tracepoint(struct dummy_tracepoint_args *arg) {
    return test_long_fname_1(arg, 0, 0, 0);
  }

  $ bpftool p d xlated id 24
  ...
  int test_long_fname_1(struct dummy_tracepoint_args * arg,
                        struct t * p1, union u * p2,
                        __u32 unused)
  ...
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

8772c8bc

tools: bpftool: refactor btf_dumper_int_bits() · 9f95e37e

Yonghong Song authored Dec 15, 2018

The core dump funcitonality in btf_dumper_int_bits() is
refactored into a separate function btf_dumper_bitfield()
which will be used by the next patch.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9f95e37e

tools/bpf: test kernel bpffs map pretty print with struct kind_flag · d0ebce68

Yonghong Song authored Dec 15, 2018

The new tests are added to test bpffs map pretty print in kernel with kind_flag
for structure type.

  $ test_btf -p
  ......
  BTF pretty print array(#1)......OK
  BTF pretty print array(#2)......OK
  PASS:8 SKIP:0 FAIL:0
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d0ebce68

tools/bpf: add test_btf unit tests for kind_flag · cd9de5d3

Yonghong Song authored Dec 15, 2018

This patch added unit tests for different types handling
type->info.kind_flag. The following new tests are added:
  $ test_btf
  ...
  BTF raw test[82] (invalid int kind_flag): OK
  BTF raw test[83] (invalid ptr kind_flag): OK
  BTF raw test[84] (invalid array kind_flag): OK
  BTF raw test[85] (invalid enum kind_flag): OK
  BTF raw test[86] (valid fwd kind_flag): OK
  BTF raw test[87] (invalid typedef kind_flag): OK
  BTF raw test[88] (invalid volatile kind_flag): OK
  BTF raw test[89] (invalid const kind_flag): OK
  BTF raw test[90] (invalid restrict kind_flag): OK
  BTF raw test[91] (invalid func kind_flag): OK
  BTF raw test[92] (invalid func_proto kind_flag): OK
  BTF raw test[93] (valid struct kind_flag, bitfield_size = 0): OK
  BTF raw test[94] (valid struct kind_flag, int member, bitfield_size != 0): OK
  BTF raw test[95] (valid union kind_flag, int member, bitfield_size != 0): OK
  BTF raw test[96] (valid struct kind_flag, enum member, bitfield_size != 0): OK
  BTF raw test[97] (valid union kind_flag, enum member, bitfield_size != 0): OK
  BTF raw test[98] (valid struct kind_flag, typedef member, bitfield_size != 0): OK
  BTF raw test[99] (valid union kind_flag, typedef member, bitfield_size != 0): OK
  BTF raw test[100] (invalid struct type, bitfield_size greater than struct size): OK
  BTF raw test[101] (invalid struct type, kind_flag bitfield base_type int not regular): OK
  BTF raw test[102] (invalid struct type, kind_flag base_type int not regular): OK
  BTF raw test[103] (invalid union type, bitfield_size greater than struct size): OK
  ...
  PASS:122 SKIP:0 FAIL:0

The second parameter name of macro
  BTF_INFO_ENC(kind, root, vlen)
in selftests test_btf.c is also renamed from "root" to "kind_flag".
Note that before this patch "root" is not used and always 0.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cd9de5d3

tools/bpf: sync btf.h header from kernel to tools · 128b343d

Yonghong Song authored Dec 15, 2018

Sync include/uapi/linux/btf.h to tools/include/uapi/linux/btf.h.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

128b343d

bpf: enable cgroup local storage map pretty print with kind_flag · ffa0c1cf

Yonghong Song authored Dec 15, 2018

Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
local storage maps") added bpffs pretty print for cgroup
local storage maps. The commit worked for struct without kind_flag
set.

This patch refactored and made pretty print also work
with kind_flag set for the struct.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ffa0c1cf

bpf: btf: fix struct/union/fwd types with kind_flag · 9d5f9f70

Yonghong Song authored Dec 15, 2018

This patch fixed two issues with BTF. One is related to
struct/union bitfield encoding and the other is related to
forward type.

Issue #1 and solution:

======================

Current btf encoding of bitfield follows what pahole generates.
For each bitfield, pahole will duplicate the type chain and
put the bitfield size at the final int or enum type.
Since the BTF enum type cannot encode bit size,
pahole workarounds the issue by generating
an int type whenever the enum bit size is not 32.

For example,
  -bash-4.4$ cat t.c
  typedef int ___int;
  enum A { A1, A2, A3 };
  struct t {
    int a[5];
    ___int b:4;
    volatile enum A c:4;
  } g;
  -bash-4.4$ gcc -c -O2 -g t.c
The current kernel supports the following BTF encoding:
  $ pahole -JV t.o
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t size=24 vlen=3
        a type_id=5 bits_offset=0
        b type_id=9 bits_offset=160
        c type_id=11 bits_offset=164
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3
  [8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
  [9] TYPEDEF ___int type_id=8
  [10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
  [11] VOLATILE (anon) type_id=10

Two issues are in the above:
  . by changing enum type to int, we lost the original
    type information and this will not be ideal later
    when we try to convert BTF to a header file.
  . the type duplication for bitfields will cause
    BTF bloat. Duplicated types cannot be deduplicated
    later if the bitfield size is different.

To fix this issue, this patch implemented a compatible
change for BTF struct type encoding:
  . the bit 31 of struct_type->info, previously reserved,
    now is used to indicate whether bitfield_size is
    encoded in btf_member or not.
  . if bit 31 of struct_type->info is set,
    btf_member->offset will encode like:
      bit 0 - 23: bit offset
      bit 24 - 31: bitfield size
    if bit 31 is not set, the old behavior is preserved:
      bit 0 - 31: bit offset

So if the struct contains a bit field, the maximum bit offset
will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
bitfield size will be 256 which is enough for today as maximum
bitfield in compiler can be 128 where int128 type is supported.

This kernel patch intends to support the new BTF encoding:
  $ pahole -JV t.o
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t kind_flag=1 size=24 vlen=3
        a type_id=5 bitfield_size=0 bits_offset=0
        b type_id=1 bitfield_size=4 bits_offset=160
        c type_id=7 bitfield_size=4 bits_offset=164
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3

Issue #2 and solution:
======================

Current forward type in BTF does not specify whether the original
type is struct or union. This will not work for type pretty print
and BTF-to-header-file conversion as struct/union must be specified.
  $ cat tt.c
  struct t;
  union u;
  int foo(struct t *t, union u *u) { return 0; }
  $ gcc -c -g -O2 tt.c
  $ pahole -JV tt.o
  [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [2] FWD t type_id=0
  [3] PTR (anon) type_id=2
  [4] FWD u type_id=0
  [5] PTR (anon) type_id=4

To fix this issue, similar to issue #1, type->info bit 31
is used. If the bit is set, it is union type. Otherwise, it is
a struct type.

  $ pahole -JV tt.o
  [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [2] FWD t kind_flag=0 type_id=0
  [3] PTR (anon) kind_flag=0 type_id=2
  [4] FWD u kind_flag=1 type_id=0
  [5] PTR (anon) kind_flag=0 type_id=4

Pahole/LLVM change:
===================

The new kind_flag functionality has been implemented in pahole
and llvm:
  https://github.com/yonghong-song/pahole/tree/bitfield
  https://github.com/yonghong-song/llvm/tree/bitfield

Note that pahole hasn't implemented func/func_proto kind
and .BTF.ext. So to print function signature with bpftool,
the llvm compiler should be used.

Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9d5f9f70

bpf: btf: refactor btf_int_bits_seq_show() · f97be3ab

Yonghong Song authored Dec 15, 2018

Refactor function btf_int_bits_seq_show() by creating
function btf_bitfield_seq_show() which has no dependence
on btf and btf_type. The function btf_bitfield_seq_show()
will be in later patch to directly dump bitfield member values.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f97be3ab