1. 11 Apr, 2023 13 commits
    • Andrii Nakryiko's avatar
      bpf: Add log_true_size output field to return necessary log buffer size · 47a71c1f
      Andrii Nakryiko authored
      Add output-only log_true_size and btf_log_true_size field to
      BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return
      the size of log buffer necessary to fit in all the log contents at
      specified log_level. This is very useful for BPF loader libraries like
      libbpf to be able to size log buffer correctly, but could be used by
      users directly, if necessary, as well.
      
      This patch plumbs all this through the code, taking into account actual
      bpf_attr size provided by user to determine if these new fields are
      expected by users. And if they are, set them from kernel on return.
      
      We refactory btf_parse() function to accommodate this, moving attr and
      uattr handling inside it. The rest is very straightforward code, which
      is split from the logging accounting changes in the previous patch to
      make it simpler to review logic vs UAPI changes.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-13-andrii@kernel.org
      47a71c1f
    • Andrii Nakryiko's avatar
      bpf: Keep track of total log content size in both fixed and rolling modes · fa1c7d5c
      Andrii Nakryiko authored
      Change how we do accounting in BPF_LOG_FIXED mode and adopt log->end_pos
      as *logical* log position. This means that we can go beyond physical log
      buffer size now and be able to tell what log buffer size should be to
      fit entire log contents without -ENOSPC.
      
      To do this for BPF_LOG_FIXED mode, we need to remove a short-circuiting
      logic of not vsnprintf()'ing further log content once we filled up
      user-provided buffer, which is done by bpf_verifier_log_needed() checks.
      We modify these checks to always keep going if log->level is non-zero
      (i.e., log is requested), even if log->ubuf was NULL'ed out due to
      copying data to user-space, or if entire log buffer is physically full.
      We adopt bpf_verifier_vlog() routine to work correctly with
      log->ubuf == NULL condition, performing log formatting into temporary
      kernel buffer, doing all the necessary accounting, but just avoiding
      copying data out if buffer is full or NULL'ed out.
      
      With these changes, it's now possible to do this sort of determination of
      log contents size in both BPF_LOG_FIXED and default rolling log mode.
      We need to keep in mind bpf_vlog_reset(), though, which shrinks log
      contents after successful verification of a particular code path. This
      log reset means that log->end_pos isn't always increasing, so to return
      back to users what should be the log buffer size to fit all log content
      without causing -ENOSPC even in the presence of log resetting, we need
      to keep maximum over "lifetime" of logging. We do this accounting in
      bpf_vlog_update_len_max() helper.
      
      A related and subtle aspect is that with this logical log->end_pos even in
      BPF_LOG_FIXED mode we could temporary "overflow" buffer, but then reset
      it back with bpf_vlog_reset() to a position inside user-supplied
      log_buf. In such situation we still want to properly maintain
      terminating zero. We will eventually return -ENOSPC even if final log
      buffer is small (we detect this through log->len_max check). This
      behavior is simpler to reason about and is consistent with current
      behavior of verifier log. Handling of this required a small addition to
      bpf_vlog_reset() logic to avoid doing put_user() beyond physical log
      buffer dimensions.
      
      Another issue to keep in mind is that we limit log buffer size to 32-bit
      value and keep such log length as u32, but theoretically verifier could
      produce huge log stretching beyond 4GB. Instead of keeping (and later
      returning) 64-bit log length, we cap it at UINT_MAX. Current UAPI makes
      it impossible to specify log buffer size bigger than 4GB anyways, so we
      don't really loose anything here and keep everything consistently 32-bit
      in UAPI. This property will be utilized in next patch.
      
      Doing the same determination of maximum log buffer for rolling mode is
      trivial, as log->end_pos and log->start_pos are already logical
      positions, so there is nothing new there.
      
      These changes do incidentally fix one small issue with previous logging
      logic. Previously, if use provided log buffer of size N, and actual log
      output was exactly N-1 bytes + terminating \0, kernel logic coun't
      distinguish this condition from log truncation scenario which would end
      up with truncated log contents of N-1 bytes + terminating \0 as well.
      
      But now with log->end_pos being logical position that could go beyond
      actual log buffer size, we can distinguish these two conditions, which
      we do in this patch. This plays nicely with returning log_size_actual
      (implemented in UAPI in the next patch), as we can now guarantee that if
      user takes such log_size_actual and provides log buffer of that exact
      size, they will not get -ENOSPC in return.
      
      All in all, all these changes do conceptually unify fixed and rolling
      log modes much better, and allow a nice feature requested by users:
      knowing what should be the size of the buffer to avoid -ENOSPC.
      
      We'll plumb this through the UAPI and the code in the next patch.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-12-andrii@kernel.org
      fa1c7d5c
    • Andrii Nakryiko's avatar
      bpf: Simplify logging-related error conditions handling · 8a6ca6bc
      Andrii Nakryiko authored
      Move log->level == 0 check into bpf_vlog_truncated() instead of doing it
      explicitly. Also remove unnecessary goto in kernel/bpf/verifier.c.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-11-andrii@kernel.org
      8a6ca6bc
    • Andrii Nakryiko's avatar
      bpf: Avoid incorrect -EFAULT error in BPF_LOG_KERNEL mode · cbedb42a
      Andrii Nakryiko authored
      If verifier log is in BPF_LOG_KERNEL mode, no log->ubuf is expected and
      it stays NULL throughout entire verification process. Don't erroneously
      return -EFAULT in such case.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-10-andrii@kernel.org
      cbedb42a
    • Andrii Nakryiko's avatar
      bpf: Fix missing -EFAULT return on user log buf error in btf_parse() · 971fb505
      Andrii Nakryiko authored
      btf_parse() is missing -EFAULT error return if log->ubuf was NULL-ed out
      due to error while copying data into user-provided buffer. Add it, but
      handle a special case of BPF_LOG_KERNEL in which log->ubuf is always NULL.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-9-andrii@kernel.org
      971fb505
    • Andrii Nakryiko's avatar
      bpf: Ignore verifier log reset in BPF_LOG_KERNEL mode · 24bc8088
      Andrii Nakryiko authored
      Verifier log position reset is meaningless in BPF_LOG_KERNEL mode, so
      just exit early in bpf_vlog_reset() if log->level is BPF_LOG_KERNEL.
      
      This avoid meaningless put_user() into NULL log->ubuf.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-8-andrii@kernel.org
      24bc8088
    • Andrii Nakryiko's avatar
      selftests/bpf: Add fixed vs rotating verifier log tests · b1a7a480
      Andrii Nakryiko authored
      Add selftests validating BPF_LOG_FIXED behavior, which used to be the
      only behavior, and now default rotating BPF verifier log, which returns
      just up to last N bytes of full verifier log, instead of returning
      -ENOSPC.
      
      To stress test correctness of in-kernel verifier log logic, we force it
      to truncate program's verifier log to all lengths from 1 all the way to
      its full size (about 450 bytes today). This was a useful stress test
      while developing the feature.
      
      For both fixed and rotating log modes we expect -ENOSPC if log contents
      doesn't fit in user-supplied log buffer.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-7-andrii@kernel.org
      b1a7a480
    • Andrii Nakryiko's avatar
      veristat: Add more veristat control over verifier log options · d0d75c67
      Andrii Nakryiko authored
      Add --log-size to be able to customize log buffer sent to bpf() syscall
      for BPF program verification logging.
      
      Add --log-fixed to enforce BPF_LOG_FIXED behavior for BPF verifier log.
      This is useful in unlikely event that beginning of truncated verifier
      log is more important than the end of it (which with rotating verifier
      log behavior is the default now).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-6-andrii@kernel.org
      d0d75c67
    • Andrii Nakryiko's avatar
      libbpf: Don't enforce unnecessary verifier log restrictions on libbpf side · e0aee1fa
      Andrii Nakryiko authored
      This basically prevents any forward compatibility. And we either way
      just return -EINVAL, which would otherwise be returned from bpf()
      syscall anyways.
      
      Similarly, drop enforcement of non-NULL log_buf when log_level > 0. This
      won't be true anymore soon.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-5-andrii@kernel.org
      e0aee1fa
    • Andrii Nakryiko's avatar
      bpf: Switch BPF verifier log to be a rotating log by default · 12166409
      Andrii Nakryiko authored
      Currently, if user-supplied log buffer to collect BPF verifier log turns
      out to be too small to contain full log, bpf() syscall returns -ENOSPC,
      fails BPF program verification/load, and preserves first N-1 bytes of
      the verifier log (where N is the size of user-supplied buffer).
      
      This is problematic in a bunch of common scenarios, especially when
      working with real-world BPF programs that tend to be pretty complex as
      far as verification goes and require big log buffers. Typically, it's
      when debugging tricky cases at log level 2 (verbose). Also, when BPF program
      is successfully validated, log level 2 is the only way to actually see
      verifier state progression and all the important details.
      
      Even with log level 1, it's possible to get -ENOSPC even if the final
      verifier log fits in log buffer, if there is a code path that's deep
      enough to fill up entire log, even if normally it would be reset later
      on (there is a logic to chop off successfully validated portions of BPF
      verifier log).
      
      In short, it's not always possible to pre-size log buffer. Also, what's
      worse, in practice, the end of the log most often is way more important
      than the beginning, but verifier stops emitting log as soon as initial
      log buffer is filled up.
      
      This patch switches BPF verifier log behavior to effectively behave as
      rotating log. That is, if user-supplied log buffer turns out to be too
      short, verifier will keep overwriting previously written log,
      effectively treating user's log buffer as a ring buffer. -ENOSPC is
      still going to be returned at the end, to notify user that log contents
      was truncated, but the important last N bytes of the log would be
      returned, which might be all that user really needs. This consistent
      -ENOSPC behavior, regardless of rotating or fixed log behavior, allows
      to prevent backwards compatibility breakage. The only user-visible
      change is which portion of verifier log user ends up seeing *if buffer
      is too small*. Given contents of verifier log itself is not an ABI,
      there is no breakage due to this behavior change. Specialized tools that
      rely on specific contents of verifier log in -ENOSPC scenario are
      expected to be easily adapted to accommodate old and new behaviors.
      
      Importantly, though, to preserve good user experience and not require
      every user-space application to adopt to this new behavior, before
      exiting to user-space verifier will rotate log (in place) to make it
      start at the very beginning of user buffer as a continuous
      zero-terminated string. The contents will be a chopped off N-1 last
      bytes of full verifier log, of course.
      
      Given beginning of log is sometimes important as well, we add
      BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows
      tools like veristat to request first part of verifier log, if necessary.
      BPF_LOG_FIXED flag is also a simple and straightforward way to check if
      BPF verifier supports rotating behavior.
      
      On the implementation side, conceptually, it's all simple. We maintain
      64-bit logical start and end positions. If we need to truncate the log,
      start position will be adjusted accordingly to lag end position by
      N bytes. We then use those logical positions to calculate their matching
      actual positions in user buffer and handle wrap around the end of the
      buffer properly. Finally, right before returning from bpf_check(), we
      rotate user log buffer contents in-place as necessary, to make log
      contents contiguous. See comments in relevant functions for details.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-4-andrii@kernel.org
      12166409
    • Andrii Nakryiko's avatar
      bpf: Remove minimum size restrictions on verifier log buffer · 03cc3aa6
      Andrii Nakryiko authored
      It's not clear why we have 128 as minimum size, but it makes testing
      harder and seems unnecessary, as we carefully handle truncation
      scenarios and use proper snprintf variants. So remove this limitation
      and just enforce positive length for log buffer.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-3-andrii@kernel.org
      03cc3aa6
    • Andrii Nakryiko's avatar
      bpf: Split off basic BPF verifier log into separate file · 4294a0a7
      Andrii Nakryiko authored
      kernel/bpf/verifier.c file is large and growing larger all the time. So
      it's good to start splitting off more or less self-contained parts into
      separate files to keep source code size (somewhat) somewhat under
      control.
      
      This patch is a one step in this direction, moving some of BPF verifier log
      routines into a separate kernel/bpf/log.c. Right now it's most low-level
      and isolated routines to append data to log, reset log to previous
      position, etc. Eventually we could probably move verifier state
      printing logic here as well, but this patch doesn't attempt to do that
      yet.
      
      Subsequent patches will add more logic to verifier log management, so
      having basics in a separate file will make sure verifier.c doesn't grow
      more with new changes.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230406234205.323208-2-andrii@kernel.org
      4294a0a7
    • Alejandro Colomar's avatar
      bpf: Remove extra whitespace in SPDX tag for syscall/helpers man pages · eafa9215
      Alejandro Colomar authored
      There is an extra whitespace in the SPDX tag, before the license name,
      in the script for generating man pages for the bpf() syscall and the
      helpers. It has caused problems in Debian packaging, in the tool that
      autodetects licenses. Let's clean it up.
      
      Fixes: 5cb62b75 ("bpf, docs: Use SPDX license identifier in bpf_doc.py")
      Signed-off-by: default avatarAlejandro Colomar <alx@kernel.org>
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230411144747.66734-1-quentin@isovalent.com
      eafa9215
  2. 10 Apr, 2023 1 commit
    • Manu Bretelle's avatar
      selftests/bpf: Reset err when symbol name already exist in kprobe_multi_test · c4d3b488
      Manu Bretelle authored
      When trying to add a name to the hashmap, an error code of EEXIST is
      returned and we continue as names are possibly duplicated in the sys
      file.
      
      If the last name in the file is a duplicate, we will continue to the
      next iteration of the while loop, and exit the loop with a value of err
      set to EEXIST and enter the error label with err set, which causes the
      test to fail when it should not.
      
      This change reset err to 0 before continue-ing into the next iteration,
      this way, if there is no more data to read from the file we iterate
      through, err will be set to 0.
      
      Behaviour prior to this change:
      ```
      test_kprobe_multi_bench_attach:FAIL:get_syms unexpected error: -17
      (errno 2)
      
      All error logs:
      test_kprobe_multi_bench_attach:FAIL:get_syms unexpected error: -17
      (errno 2)
      Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
      ```
      
      After this change:
      ```
      Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
      ```
      Signed-off-by: default avatarManu Bretelle <chantr4@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230408022919.54601-1-chantr4@gmail.com
      c4d3b488
  3. 07 Apr, 2023 4 commits
  4. 06 Apr, 2023 17 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Improve verifier for cond_op and spilled loop index variables' · 4daf0b32
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      
      LLVM commit [1] introduced hoistMinMax optimization like
        (i < VIRTIO_MAX_SGS) && (i < out_sgs)
      to
        upper = MIN(VIRTIO_MAX_SGS, out_sgs)
        ... i < upper ...
      and caused the verification failure. Commit [2] workarounded the issue by
      adding some bpf assembly code to prohibit the above optimization.
      This patch improved verifier such that verification can succeed without
      the above workaround.
      
      Without [2], the current verifier will hit the following failures:
        ...
        119: (15) if r1 == 0x0 goto pc+1
        The sequence of 8193 jumps is too complex.
        verification time 525829 usec
        stack depth 64
        processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12
        -- END PROG LOAD LOG --
        libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
        libbpf: failed to load object 'loop6.bpf.o'
        ...
      The failure is due to verifier inadequately handling '<const> <cond_op> <non_const>' which will
      go through both pathes and generate the following verificaiton states:
        ...
        89: (07) r2 += 1                      ; R2_w=5
        90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
        91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
        92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=6,umax=5,var_off=(0x4; 0x3))
            R2_w=5 R6_w=scalar(id=385) R7_w=0 R8_w=scalar() R9_w=scalar(umax=21474836475,var_off=(0x0; 0x7ffffffff))
            R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40_w=4 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
        ...
        89: (07) r2 += 1                      ; R2_w=6
        90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
        91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
        92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=7,umax=5,var_off=(0x4; 0x3))
            R2_w=6 R6=scalar(id=388) R7=0 R8_w=scalar() R9_w=scalar(umax=25769803770,var_off=(0x0; 0x7ffffffff))
            R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=5 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
          ...
        89: (07) r2 += 1                      ; R2_w=4088
        90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
        91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
        92: (ad) if r2 < r1 goto pc+41        ; R0=scalar() R1=scalar(umin=4089,umax=5,var_off=(0x0; 0x7))
            R2=4088 R6=scalar(id=12634) R7=0 R8=scalar() R9=scalar(umax=17557826301960,var_off=(0x0; 0xfffffffffff))
            R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=4087 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
      
      Patch 3 fixed the above issue by handling '<const> <cond_op> <non_const>' properly.
      During developing selftests for Patch 3, I found some issues with bound deduction with
      BPF_EQ/BPF_NE and fixed the issue in Patch 1.
      
      After the above issue is fixed, the second issue shows up.
        ...
        67: (07) r1 += -16                    ; R1_w=fp-16
        ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i);
        68: (b4) w2 = 8                       ; R2_w=8
        69: (85) call bpf_probe_read_kernel#113       ; R0_w=scalar() fp-16=mmmmmmmm
        ; return sgp;
        70: (79) r6 = *(u64 *)(r10 -16)       ; R6=scalar() R10=fp0
        ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
        71: (15) if r6 == 0x0 goto pc-49      ; R6=scalar()
        72: (b4) w1 = 0                       ; R1_w=0
        73: (05) goto pc-46
        ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
        28: (bc) w7 = w1                      ; R1_w=0 R7_w=0
        ; bpf_probe_read_kernel(&len, sizeof(len), &sgp->length);
        ...
        23: (79) r3 = *(u64 *)(r10 -40)       ; R3_w=2 R10=fp0
        ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
        24: (07) r3 += 1                      ; R3_w=3
        ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
        25: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
        26: (ad) if r3 < r1 goto pc+34 61: R0=scalar() R1_w=scalar(umin=4,umax=5,var_off=(0x4; 0x1)) R3_w=3 R6=scalar(id=1658)
           R7=0 R8=scalar(id=1653) R9=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm
           fp-24=mmmm???? fp-32= fp-40=2 fp-56= fp-64=mmmmmmmm
        ; if (sg_is_chain(&sg))
        61: (7b) *(u64 *)(r10 -40) = r3       ; R3_w=3 R10=fp0 fp-40_w=3
          ...
        67: (07) r1 += -16                    ; R1_w=fp-16
        ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i);
        68: (b4) w2 = 8                       ; R2_w=8
        69: (85) call bpf_probe_read_kernel#113       ; R0_w=scalar() fp-16=mmmmmmmm
        ; return sgp;
        70: (79) r6 = *(u64 *)(r10 -16)
        ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
        infinite loop detected at insn 71
        verification time 90800 usec
        stack depth 64
        processed 25017 insns (limit 1000000) max_states_per_insn 20 total_states 491 peak_states 169 mark_read 12
        -- END PROG LOAD LOG --
        libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -22
      
      Further analysis found the index variable 'i' is spilled but since it is not marked as precise.
      This is more tricky as identifying induction variable is not easy in verifier. Although a heuristic
      is possible, let us leave it for now.
      
        [1] https://reviews.llvm.org/D143726
        [2] Commit 3c2611ba ("selftests/bpf: Fix trace_virtqueue_add_sgs test issue with LLVM 17.")
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4daf0b32
    • Yonghong Song's avatar
      selftests/bpf: Add verifier tests for code pattern '<const> <cond_op> <non_const>' · 23a88fae
      Yonghong Song authored
      Add various tests for code pattern '<const> <cond_op> <non_const>' to
      exercise the previous verifier patch.
      
      The following are veristat changed number of processed insns stat
      comparing the previous patch vs. this patch:
      
      File                                                   Program                                               Insns (A)  Insns (B)  Insns  (DIFF)
      -----------------------------------------------------  ----------------------------------------------------  ---------  ---------  -------------
      test_seg6_loop.bpf.linked3.o                           __add_egr_x                                               12423      12314  -109 (-0.88%)
      
      Only one program is affected with minor change.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230406164510.1047757-1-yhs@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      23a88fae
    • Yonghong Song's avatar
      bpf: Improve handling of pattern '<const> <cond_op> <non_const>' in verifier · 953d9f5b
      Yonghong Song authored
      Currently, the verifier does not handle '<const> <cond_op> <non_const>' well.
      For example,
        ...
        10: (79) r1 = *(u64 *)(r10 -16)       ; R1_w=scalar() R10=fp0
        11: (b7) r2 = 0                       ; R2_w=0
        12: (2d) if r2 > r1 goto pc+2
        13: (b7) r0 = 0
        14: (95) exit
        15: (65) if r1 s> 0x1 goto pc+3
        16: (0f) r0 += r1
        ...
      At insn 12, verifier decides both true and false branch are possible, but
      actually only false branch is possible.
      
      Currently, the verifier already supports patterns '<non_const> <cond_op> <const>.
      Add support for patterns '<const> <cond_op> <non_const>' in a similar way.
      
      Also fix selftest 'verifier_bounds_mix_sign_unsign/bounds checks mixing signed and unsigned, variant 10'
      due to this change.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230406164505.1046801-1-yhs@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      953d9f5b
    • Yonghong Song's avatar
      selftests/bpf: Add tests for non-constant cond_op NE/EQ bound deduction · aec08d67
      Yonghong Song authored
      Add various tests for code pattern '<non-const> NE/EQ <const>' implemented
      in the previous verifier patch. Without the verifier patch, these new
      tests will fail.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230406164500.1045715-1-yhs@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      aec08d67
    • Yonghong Song's avatar
      bpf: Improve verifier JEQ/JNE insn branch taken checking · 13fbcee5
      Yonghong Song authored
      Currently, for BPF_JEQ/BPF_JNE insn, verifier determines
      whether the branch is taken or not only if both operands
      are constants. Therefore, for the following code snippet,
        0: (85) call bpf_ktime_get_ns#5       ; R0_w=scalar()
        1: (a5) if r0 < 0x3 goto pc+2         ; R0_w=scalar(umin=3)
        2: (b7) r2 = 2                        ; R2_w=2
        3: (1d) if r0 == r2 goto pc+2 6
      
      At insn 3, since r0 is not a constant, verifier assumes both branch
      can be taken which may lead inproper verification failure.
      
      Add comparing umin/umax value and the constant. If the umin value
      is greater than the constant, or umax value is smaller than the constant,
      for JEQ the branch must be not-taken, and for JNE the branch must be taken.
      The jmp32 mode JEQ/JNE branch taken checking is also handled similarly.
      
      The following lists the veristat result w.r.t. changed number
      of processes insns during verification:
      
      File                                                   Program                                               Insns (A)  Insns (B)  Insns    (DIFF)
      -----------------------------------------------------  ----------------------------------------------------  ---------  ---------  ---------------
      test_cls_redirect.bpf.linked3.o                        cls_redirect                                              64980      73472  +8492 (+13.07%)
      test_seg6_loop.bpf.linked3.o                           __add_egr_x                                               12425      12423      -2 (-0.02%)
      test_tcp_hdr_options.bpf.linked3.o                     estab                                                      2634       2558     -76 (-2.89%)
      test_parse_tcp_hdr_opt.bpf.linked3.o                   xdp_ingress_v6                                             1421       1420      -1 (-0.07%)
      test_parse_tcp_hdr_opt_dynptr.bpf.linked3.o            xdp_ingress_v6                                             1238       1237      -1 (-0.08%)
      test_tc_dtime.bpf.linked3.o                            egress_fwdns_prio100                                        414        411      -3 (-0.72%)
      
      Mostly a small improvement but test_cls_redirect.bpf.linked3.o has a 13% regression.
      I checked with verifier log and found it this is due to pruning.
      For some JEQ/JNE branches impacted by this patch,
      one branch is explored and the other has state equivalence and
      pruned.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230406164455.1045294-1-yhs@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      13fbcee5
    • Martin KaFai Lau's avatar
      Merge branch 'xsk: Fix unaligned descriptor validation' · a5f1da66
      Martin KaFai Lau authored
      Kal Conley says:
      
      ====================
      
      This patchset includes the test with the bugfix as requested here:
      https://lore.kernel.org/all/f1a32d5a-03e7-fce1-f5a5-6095f365f0a9@linux.dev/
      
      Patch #1 (the bugfix) is identical to the previous submission except
      that I improved the commit message slightly.
      
      Magnus: I improved the test code a little different than you asked
      since I thought this was a little simpler than having a separate
      function for now. Hopefully, you can live with this :-).
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      a5f1da66
    • Kal Conley's avatar
      selftests: xsk: Add test UNALIGNED_INV_DESC_4K1_FRAME_SIZE · c0801598
      Kal Conley authored
      Add unaligned descriptor test for frame size of 4001. Using an odd frame
      size ensures that the end of the UMEM is not near a page boundary. This
      allows testing descriptors that staddle the end of the UMEM but not a
      page.
      
      This test used to fail without the previous commit ("xsk: Fix unaligned
      descriptor validation").
      Signed-off-by: default avatarKal Conley <kal.conley@dectris.com>
      Link: https://lore.kernel.org/r/20230405235920.7305-3-kal.conley@dectris.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      c0801598
    • Kal Conley's avatar
      xsk: Fix unaligned descriptor validation · d769ccaf
      Kal Conley authored
      Make sure unaligned descriptors that straddle the end of the UMEM are
      considered invalid. Currently, descriptor validation is broken for
      zero-copy mode which only checks descriptors at page granularity.
      For example, descriptors in zero-copy mode that overrun the end of the
      UMEM but not a page boundary are (incorrectly) considered valid. The
      UMEM boundary check needs to happen before the page boundary and
      contiguity checks in xp_desc_crosses_non_contig_pg(). Do this check in
      xp_unaligned_validate_desc() instead like xp_check_unaligned() already
      does.
      
      Fixes: 2b43470a ("xsk: Introduce AF_XDP buffer allocation API")
      Signed-off-by: default avatarKal Conley <kal.conley@dectris.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/r/20230405235920.7305-2-kal.conley@dectris.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      d769ccaf
    • Viktor Malik's avatar
      kallsyms: move module-related functions under correct configs · 34bf9347
      Viktor Malik authored
      Functions for searching module kallsyms should have non-empty
      definitions only if CONFIG_MODULES=y and CONFIG_KALLSYMS=y. Until now,
      only CONFIG_MODULES check was used for many of these, which may have
      caused complilation errors on some configs.
      
      This patch moves all relevant functions under the correct configs.
      
      Fixes: bd5314f8 ("kallsyms, bpf: Move find_kallsyms_symbol_value out of internal header")
      Signed-off-by: default avatarViktor Malik <vmalik@redhat.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202303181535.RFDCnz3E-lkp@intel.com/
      Link: https://lore.kernel.org/r/20230330102001.2183693-1-vmalik@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      34bf9347
    • Alexei Starovoitov's avatar
      Merge branch 'bpftool: Add inline annotations when dumping program CFGs' · c6ebae4c
      Alexei Starovoitov authored
      Quentin Monnet says:
      
      ====================
      
      This set contains some improvements for bpftool's "visual" program dump
      option, which produces the control flow graph in a DOT format. The main
      objective is to add support for inline annotations on such graphs, so that
      we can have the C source code for the program showing up alongside the
      instructions, when available. The last commits also make it possible to
      display the line numbers or the bare opcodes in the graph, as supported by
      regular program dumps.
      
      v3:
      - Fixed formatting of DOT graph: escape spaces, and remove indent that
        would cause some unwanted spaces to show up in the resulting graph.
      - Don't print line information if the record is empty.
      - Add '<' and ' ' to the list of escaped characters for generting the
        DOT graph.
      - Truncate long file paths, use shorter field names ("line", "col") for
        code location information in the graph, add missing separator space.
      - Add a commit to return an error if JSON output and CFG are both
        required.
      - Add a drive-by, clean up commit for bash completion (avoid unnecessary
        calls to _bpftool_once_attr()).
      
      v2: Replace fputc(..., stdout) with putchar(...) in dotlabel_puts().
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c6ebae4c
    • Quentin Monnet's avatar
      bpftool: Clean up _bpftool_once_attr() calls in bash completion · 73192968
      Quentin Monnet authored
      In bpftool's bash completion file, function _bpftool_once_attr() is able
      to process multiple arguments. There are a few locations where this
      function is called multiple times in a row, each time for a single
      argument; let's pass all arguments instead to minimize the number of
      function calls required for the completion.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230405132120.59886-8-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      73192968
    • Quentin Monnet's avatar
      bpftool: Support printing opcodes and source file references in CFG · 7483a7a7
      Quentin Monnet authored
      Add support for displaying opcodes or/and file references (filepath,
      line and column numbers) when dumping the control flow graphs of loaded
      BPF programs with bpftool.
      
      The filepaths in the records are absolute. To avoid blocks on the graph
      to get too wide, we truncate them when they get too long (but we always
      keep the entire file name). In the unlikely case where the resulting
      file name is ambiguous, it remains possible to get the full path with a
      regular dump (no CFG).
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230405132120.59886-7-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7483a7a7
    • Quentin Monnet's avatar
      bpftool: Support "opcodes", "linum", "visual" simultaneously · 9b79f027
      Quentin Monnet authored
      When dumping a program, the keywords "opcodes" (for printing the raw
      opcodes), "linum" (for displaying the filename, line number, column
      number along with the source code), and "visual" (for generating the
      control flow graph for translated programs) are mutually exclusive. But
      there's no reason why they should be. Let's make it possible to pass
      several of them at once. The "file FILE" option, which makes bpftool
      output a binary image to a file, remains incompatible with the others.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230405132120.59886-6-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9b79f027
    • Quentin Monnet's avatar
      bpftool: Return an error on prog dumps if both CFG and JSON are required · 05a06be7
      Quentin Monnet authored
      We do not support JSON output for control flow graphs of programs with
      bpftool. So far, requiring both the CFG and JSON output would result in
      producing a null JSON object. It makes more sense to raise an error
      directly when parsing command line arguments and options, so that users
      know they won't get any output they might expect.
      
      If JSON is required for the graph, we leave it to Graphviz instead:
      
          # bpftool prog dump xlated <REF> visual | dot -Tjson
      Suggested-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230405132120.59886-5-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      05a06be7
    • Quentin Monnet's avatar
      bpftool: Support inline annotations when dumping the CFG of a program · 9fd49684
      Quentin Monnet authored
      We support dumping the control flow graph of loaded programs to the DOT
      format with bpftool, but so far this feature wouldn't display the source
      code lines available through BTF along with the eBPF bytecode. Let's add
      support for these annotations, to make it easier to read the graph.
      
      In prog.c, we move the call to dump_xlated_cfg() in order to pass and
      use the full struct dump_data, instead of creating a minimal one in
      draw_bb_node().
      
      We pass the pointer to this struct down to dump_xlated_for_graph() in
      xlated_dumper.c, where most of the logics is added. We deal with BTF
      mostly like we do for plain or JSON output, except that we cannot use a
      "nr_skip" value to skip a given number of linfo records (we don't
      process the BPF instructions linearly, and apart from the root of the
      graph we don't know how many records we should skip, so we just store
      the last linfo and make sure the new one we find is different before
      printing it).
      
      When printing the source instructions to the label of a DOT graph node,
      there are a few subtleties to address. We want some special newline
      markers, and there are some characters that we must escape. To deal with
      them, we introduce a new dedicated function btf_dump_linfo_dotlabel() in
      btf_dumper.c. We'll reuse this function in a later commit to format the
      filepath, line, and column references as well.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230405132120.59886-4-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9fd49684
    • Quentin Monnet's avatar
      bpftool: Fix bug for long instructions in program CFG dumps · 67cf52cd
      Quentin Monnet authored
      When dumping the control flow graphs for programs using the 16-byte long
      load instruction, we need to skip the second part of this instruction
      when looking for the next instruction to process. Otherwise, we end up
      printing "BUG_ld_00" from the kernel disassembler in the CFG.
      
      Fixes: efcef17a ("tools: bpftool: generate .dot graph from CFG information")
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230405132120.59886-3-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      67cf52cd
    • Quentin Monnet's avatar
      bpftool: Fix documentation about line info display for prog dumps · e27f0f16
      Quentin Monnet authored
      The documentation states that when line_info is available when dumping a
      program, the source line will be displayed "by default". There is no
      notion of "default" here: the line is always displayed if available,
      there is no way currently to turn it off.
      
      In the next sentence, the documentation states that if "linum" is used
      on the command line, the relevant filename, line, and column will be
      displayed "on top of the source line". This is incorrect, as they are
      currently displayed on the right side of the source line (or on top of
      the eBPF instruction, not the source).
      
      This commit fixes the documentation to address these points.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20230405132120.59886-2-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e27f0f16
  5. 05 Apr, 2023 5 commits