1. 06 Feb, 2024 3 commits
  2. 05 Feb, 2024 5 commits
  3. 03 Feb, 2024 4 commits
  4. 02 Feb, 2024 8 commits
    • Shung-Hsi Yu's avatar
      selftests/bpf: trace_helpers.c: do not use poisoned type · a68b50f4
      Shung-Hsi Yu authored
      After commit c698eaeb ("selftests/bpf: trace_helpers.c: Optimize
      kallsyms cache") trace_helpers.c now includes libbpf_internal.h, and
      thus can no longer use the u32 type (among others) since they are poison
      in libbpf_internal.h. Replace u32 with __u32 to fix the following error
      when building trace_helpers.c on powerpc:
      
        error: attempt to use poisoned "u32"
      
      Fixes: c698eaeb ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache")
      Signed-off-by: default avatarShung-Hsi Yu <shung-hsi.yu@suse.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20240202095559.12900-1-shung-hsi.yu@suse.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      a68b50f4
    • Andrii Nakryiko's avatar
      Merge branch 'improvements-for-tracking-scalars-in-the-bpf-verifier' · 6fb3f727
      Andrii Nakryiko authored
      Maxim Mikityanskiy says:
      
      ====================
      Improvements for tracking scalars in the BPF verifier
      
      From: Maxim Mikityanskiy <maxim@isovalent.com>
      
      The goal of this series is to extend the verifier's capabilities of
      tracking scalars when they are spilled to stack, especially when the
      spill or fill is narrowing. It also contains a fix by Eduard for
      infinite loop detection and a state pruning optimization by Eduard that
      compensates for a verification complexity regression introduced by
      tracking unbounded scalars. These improvements reduce the surface of
      false rejections that I saw while working on Cilium codebase.
      
      Patches 1-9 of the original series were previously applied in v2.
      
      Patches 1-2 (Maxim): Support the case when boundary checks are first
      performed after the register was spilled to the stack.
      
      Patches 3-4 (Maxim): Support narrowing fills.
      
      Patches 5-6 (Eduard): Optimization for state pruning in stacksafe() to
      mitigate the verification complexity regression.
      
      veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ...
      
       * Without patch 5:
      
      File                  Program   States (A)  States (B)  States    (DIFF)
      --------------------  --------  ----------  ----------  ----------------
      pyperf100.bpf.o       on_event        4878        6528   +1650 (+33.83%)
      pyperf180.bpf.o       on_event        6936       11032   +4096 (+59.05%)
      pyperf600.bpf.o       on_event       22271       39455  +17184 (+77.16%)
      pyperf600_iter.bpf.o  on_event         400         490     +90 (+22.50%)
      strobemeta.bpf.o      on_event        4895       14028  +9133 (+186.58%)
      
       * With patch 5:
      
      File                     Program        States (A)  States (B)  States   (DIFF)
      -----------------------  -------------  ----------  ----------  ---------------
      bpf_xdp.o                tail_lb_ipv4         2770        2224   -546 (-19.71%)
      pyperf100.bpf.o          on_event             4878        5848   +970 (+19.89%)
      pyperf180.bpf.o          on_event             6936        8868  +1932 (+27.85%)
      pyperf600.bpf.o          on_event            22271       29656  +7385 (+33.16%)
      pyperf600_iter.bpf.o     on_event              400         450    +50 (+12.50%)
      xdp_synproxy_kern.bpf.o  syncookie_tc          280         226    -54 (-19.29%)
      xdp_synproxy_kern.bpf.o  syncookie_xdp         302         228    -74 (-24.50%)
      
      v2 changes:
      
      Fixed comments in patch 1, moved endianness checks to header files in
      patch 12 where possible, added Eduard's ACKs.
      
      v3 changes:
      
      Maxim: Removed __is_scalar_unbounded altogether, addressed Andrii's
      comments.
      
      Eduard: Patch #5 (#14 in v2) changed significantly:
      - Logical changes:
        - Handling of STACK_{MISC,ZERO} mix turned out to be incorrect:
          a mix of MISC and ZERO in old state is not equivalent to e.g.
          just MISC is current state, because verifier could have deduced
          zero scalars from ZERO slots in old state for some loads.
        - There is no reason to limit the change only to cases when
          old or current stack is a spill of unbounded scalar,
          it is valid to compare any 64-bit scalar spill with fake
          register impersonating MISC.
        - STACK_ZERO vs spilled zero case was dropped,
          after recent changes for zero handling by Andrii and Yonghong
          it is hard (impossible?) to conjure all ZERO slots for an spi.
          => the case does not make any difference in veristat results.
      - Use global static variable for unbound_reg (Andrii)
      - Code shuffling to remove duplication in stacksafe() (Andrii)
      ====================
      
      Link: https://lore.kernel.org/r/20240127175237.526726-1-maxtram95@gmail.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      6fb3f727
    • Eduard Zingerman's avatar
      selftests/bpf: States pruning checks for scalar vs STACK_MISC · 73a28d9d
      Eduard Zingerman authored
      Check that stacksafe() compares spilled scalars with STACK_MISC.
      The following combinations are explored:
      - old spill of imprecise scalar is equivalent to cur STACK_{MISC,INVALID}
        (plus error in unpriv mode);
      - old spill of precise scalar is not equivalent to cur STACK_MISC;
      - old STACK_MISC is equivalent to cur scalar;
      - old STACK_MISC is not equivalent to cur non-scalar.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-7-maxtram95@gmail.com
      73a28d9d
    • Eduard Zingerman's avatar
      bpf: Handle scalar spill vs all MISC in stacksafe() · 6efbde20
      Eduard Zingerman authored
      When check_stack_read_fixed_off() reads value from an spi
      all stack slots of which are set to STACK_{MISC,INVALID},
      the destination register is set to unbound SCALAR_VALUE.
      
      Exploit this fact by allowing stacksafe() to use a fake
      unbound scalar register to compare 'mmmm mmmm' stack value
      in old state vs spilled 64-bit scalar in current state
      and vice versa.
      
      Veristat results after this patch show some gains:
      
      ./veristat -C -e file,prog,states -f 'states_pct>10'  not-opt after
      File                     Program                States   (DIFF)
      -----------------------  ---------------------  ---------------
      bpf_overlay.o            tail_rev_nodeport_lb4    -45 (-15.85%)
      bpf_xdp.o                tail_lb_ipv4            -541 (-19.57%)
      pyperf100.bpf.o          on_event                -680 (-10.42%)
      pyperf180.bpf.o          on_event               -2164 (-19.62%)
      pyperf600.bpf.o          on_event               -9799 (-24.84%)
      strobemeta.bpf.o         on_event               -9157 (-65.28%)
      xdp_synproxy_kern.bpf.o  syncookie_tc             -54 (-19.29%)
      xdp_synproxy_kern.bpf.o  syncookie_xdp            -74 (-24.50%)
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-6-maxtram95@gmail.com
      6efbde20
    • Maxim Mikityanskiy's avatar
      selftests/bpf: Add test cases for narrowing fill · 067313a8
      Maxim Mikityanskiy authored
      The previous commit allowed to preserve boundaries and track IDs of
      scalars on narrowing fills. Add test cases for that pattern.
      Signed-off-by: default avatarMaxim Mikityanskiy <maxim@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-5-maxtram95@gmail.com
      067313a8
    • Maxim Mikityanskiy's avatar
      bpf: Preserve boundaries and track scalars on narrowing fill · c1e6148c
      Maxim Mikityanskiy authored
      When the width of a fill is smaller than the width of the preceding
      spill, the information about scalar boundaries can still be preserved,
      as long as it's coerced to the right width (done by coerce_reg_to_size).
      Even further, if the actual value fits into the fill width, the ID can
      be preserved as well for further tracking of equal scalars.
      
      Implement the above improvements, which makes narrowing fills behave the
      same as narrowing spills and MOVs between registers.
      
      Two tests are adjusted to accommodate for endianness differences and to
      take into account that it's now allowed to do a narrowing fill from the
      least significant bits.
      
      reg_bounds_sync is added to coerce_reg_to_size to correctly adjust
      umin/umax boundaries after the var_off truncation, for example, a 64-bit
      value 0xXXXXXXXX00000000, when read as a 32-bit, gets umin = 0, umax =
      0xFFFFFFFF, var_off = (0x0; 0xffffffff00000000), which needs to be
      synced down to umax = 0, otherwise reg_bounds_sanity_check doesn't pass.
      Signed-off-by: default avatarMaxim Mikityanskiy <maxim@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-4-maxtram95@gmail.com
      c1e6148c
    • Maxim Mikityanskiy's avatar
      selftests/bpf: Test tracking spilled unbounded scalars · 6be503ce
      Maxim Mikityanskiy authored
      The previous commit added tracking for unbounded scalars on spill. Add
      the test case to check the new functionality.
      Signed-off-by: default avatarMaxim Mikityanskiy <maxim@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-3-maxtram95@gmail.com
      6be503ce
    • Maxim Mikityanskiy's avatar
      bpf: Track spilled unbounded scalars · e67ddd9b
      Maxim Mikityanskiy authored
      Support the pattern where an unbounded scalar is spilled to the stack,
      then boundary checks are performed on the src register, after which the
      stack frame slot is refilled into a register.
      
      Before this commit, the verifier didn't treat the src register and the
      stack slot as related if the src register was an unbounded scalar. The
      register state wasn't copied, the id wasn't preserved, and the stack
      slot was marked as STACK_MISC. Subsequent boundary checks on the src
      register wouldn't result in updating the boundaries of the spilled
      variable on the stack.
      
      After this commit, the verifier will preserve the bond between src and
      dst even if src is unbounded, which permits to do boundary checks on src
      and refill dst later, still remembering its boundaries. Such a pattern
      is sometimes generated by clang when compiling complex long functions.
      
      One test is adjusted to reflect that now unbounded scalars are tracked.
      Signed-off-by: default avatarMaxim Mikityanskiy <maxim@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-2-maxtram95@gmail.com
      e67ddd9b
  5. 01 Feb, 2024 13 commits
  6. 31 Jan, 2024 2 commits
    • Daniel Xu's avatar
      bpf: btf: Support flags for BTF_SET8 sets · 79b47344
      Daniel Xu authored
      This commit adds support for flags on BTF_SET8s. struct btf_id_set8
      already supported 32 bits worth of flags, but was only used for
      alignment purposes before.
      
      We now use these bits to encode flags. The first use case is tagging
      kfunc sets with a flag so that pahole can recognize which
      BTF_ID_FLAGS(func, ..) are actual kfuncs.
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/r/7bb152ec76d6c2c930daec88e995bf18484a5ebb.1706491398.git.dxu@dxuuu.xyzSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      79b47344
    • Manu Bretelle's avatar
      selftests/bpf: Disable IPv6 for lwt_redirect test · 2ef61296
      Manu Bretelle authored
      After a recent change in the vmtest runner, this test started failing
      sporadically.
      
      Investigation showed that this test was subject to race condition which
      got exacerbated after the vm runner change. The symptoms being that the
      logic that waited for an ICMPv4 packet is naive and will break if 5 or
      more non-ICMPv4 packets make it to tap0.
      When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
      router solicitation...
      On a system with good performance, the expected ICMPv4 packet would very
      likely make it to the network interface promptly, but on a system with
      poor performance, those "guarantees" do not hold true anymore.
      
      Given that the test is IPv4 only, this change disable IPv6 in the test
      netns by setting `net.ipv6.conf.all.disable_ipv6` to 1.
      This essentially leaves "ping" as the sole generator of traffic in the
      network namespace.
      If this test was to be made IPv6 compatible, the logic in
      `wait_for_packet` would need to be modified.
      
      In more details...
      
      At a high level, the test does:
      - create a new namespace
      - in `setup_redirect_target` set up lo, tap0, and link_err interfaces as
        well as add 2 routes that attaches ingress/egress sections of
        `test_lwt_redirect.bpf.o` to the xmit path.
      - in `send_and_capture_test_packets` send an ICMP packet and read off
        the tap interface (using `wait_for_packet`) to check that a ICMP packet
        with the right size is read.
      
      `wait_for_packet` will try to read `max_retry` (5) times from the tap0
      fd looking for an ICMPv4 packet matching some criteria.
      
      The problem is that when we set up the `tap0` interface, because IPv6 is
      enabled by default, traffic such as Router solicitation is sent through
      tap0, as in:
      
        # tcpdump -r /tmp/lwt_redirect.pc
        reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet)
        04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32
        04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
        04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108
        04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
        04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
        04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
        04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
      
      If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in:
      
        2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec
        2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec
        2024-01-31T03:51:25.03448444Z open_netns:PASS:open /proc/self/ns/net 0 nsec
        2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec
        2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec
        2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec
        2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec
        2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec
        2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec
        2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec
        2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec
        2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec
        2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec
        2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec
        2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec
        2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec
        2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec
        2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec
        2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec
        2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec
        2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1
        2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails
        2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec
      
      When running in a VM which potential resource contrains, the odds that calling
      `ping` is not scheduled very soon after bringing `tap0` up increases,
      and with this the chances to get our ICMP packet pushed to position 6+
      in the network trace.
      
      To confirm this indeed solves the issue, I ran the test 100 times in a
      row with:
      
        errors=0
        successes=0
        for i in `seq 1 100`
        do
          ./test_progs -t lwt_redirect/lwt_redirect_normal
          if [ $? -eq 0 ]; then
            successes=$((successes+1))
          else
            errors=$((errors+1))
          fi
        done
        echo "successes: $successes/errors: $errors"
      
      While this test would at least fail a couple of time every 10 runs, here
      it ran 100 times with no error.
      
      Fixes: 43a7c3ef ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
      Signed-off-by: default avatarManu Bretelle <chantr4@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240131053212.2247527-1-chantr4@gmail.com
      2ef61296
  7. 30 Jan, 2024 5 commits