1. 10 Mar, 2022 3 commits
  2. 09 Mar, 2022 10 commits
    • Alexei Starovoitov's avatar
      Merge branch 'Add support for transmitting packets using XDP in bpf_prog_run()' · de55c9a1
      Alexei Starovoitov authored
      Toke Høiland-Jørgensen says:
      
      ====================
      
      This series adds support for transmitting packets using XDP in
      bpf_prog_run(), by enabling a new mode "live packet" mode which will handle
      the XDP program return codes and redirect the packets to the stack or other
      devices.
      
      The primary use case for this is testing the redirect map types and the
      ndo_xdp_xmit driver operation without an external traffic generator. But it
      turns out to also be useful for creating a programmable traffic generator
      in XDP, as well as injecting frames into the stack. A sample traffic
      generator, which was included in previous versions of the series, but now
      moved to xdp-tools, transmits up to 9 Mpps/core on my test machine.
      
      To transmit the frames, the new mode instantiates a page_pool structure in
      bpf_prog_run() and initialises the pages to contain XDP frames with the
      data passed in by userspace. These frames can then be handled as though
      they came from the hardware XDP path, and the existing page_pool code takes
      care of returning and recycling them. The setup is optimised for high
      performance with a high number of repetitions to support stress testing and
      the traffic generator use case; see patch 1 for details.
      
      v11:
      - Fix override of return code in xdp_test_run_batch()
      - Add Martin's ACKs to remaining patches
      
      v10:
      - Only propagate memory allocation errors from xdp_test_run_batch()
      - Get rid of BPF_F_TEST_XDP_RESERVED; batch_size can be used to probe
      - Check that batch_size is unset in non-XDP test_run funcs
      - Lower the number of repetitions in the selftest to 10k
      - Count number of recycled pages in the selftest
      - Fix a few other nits from Martin, carry forward ACKs
      
      v9:
      - XDP_DROP packets in the selftest to ensure pages are recycled
      - Fix a few issues reported by the kernel test robot
      - Rewrite the documentation of the batch size to make it a bit clearer
      - Rebase to newest bpf-next
      
      v8:
      - Make the batch size configurable from userspace
      - Don't interrupt the packet loop on errors in do_redirect (this can be
        caught from the tracepoint)
      - Add documentation of the feature
      - Add reserved flag userspace can use to probe for support (kernel didn't
        check flags previously)
      - Rebase to newest bpf-next, disallow live mode for jumbo frames
      
      v7:
      - Extend the local_bh_disable() to cover the full test run loop, to prevent
        running concurrently with the softirq. Fixes a deadlock with veth xmit.
      - Reinstate the forwarding sysctl setting in the selftest, and bump up the
        number of packets being transmitted to trigger the above bug.
      - Update commit message to make it clear that user space can select the
        ingress interface.
      
      v6:
      - Fix meta vs data pointer setting and add a selftest for it
      - Add local_bh_disable() around code passing packets up the stack
      - Create a new netns for the selftest and use a TC program instead of the
        forwarding hack to count packets being XDP_PASS'ed from the test prog.
      - Check for the correct ingress ifindex in the selftest
      - Rebase and drop patches 1-5 that were already merged
      
      v5:
      - Rebase to current bpf-next
      
      v4:
      - Fix a few code style issues (Alexei)
      - Also handle the other return codes: XDP_PASS builds skbs and injects them
        into the stack, and XDP_TX is turned into a redirect out the same
        interface (Alexei).
      - Drop the last patch adding an xdp_trafficgen program to samples/bpf; this
        will live in xdp-tools instead (Alexei).
      - Add a separate bpf_test_run_xdp_live() function to test_run.c instead of
        entangling the new mode in the existing bpf_test_run().
      
      v3:
      - Reorder patches to make sure they all build individually (Patchwork)
      - Remove a couple of unused variables (Patchwork)
      - Remove unlikely() annotation in slow path and add back John's ACK that I
        accidentally dropped for v2 (John)
      
      v2:
      - Split up up __xdp_do_redirect to avoid passing two pointers to it (John)
      - Always reset context pointers before each test run (John)
      - Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar)
      - Fix wrong offset for metadata pointer
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      de55c9a1
    • Toke Høiland-Jørgensen's avatar
      selftests/bpf: Add selftest for XDP_REDIRECT in BPF_PROG_RUN · 55fcacca
      Toke Høiland-Jørgensen authored
      This adds a selftest for the XDP_REDIRECT facility in BPF_PROG_RUN, that
      redirects packets into a veth and counts them using an XDP program on the
      other side of the veth pair and a TC program on the local side of the veth.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220309105346.100053-6-toke@redhat.com
      55fcacca
    • Toke Høiland-Jørgensen's avatar
      selftests/bpf: Move open_netns() and close_netns() into network_helpers.c · a3033884
      Toke Høiland-Jørgensen authored
      These will also be used by the xdp_do_redirect test being added in the next
      commit.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220309105346.100053-5-toke@redhat.com
      a3033884
    • Toke Høiland-Jørgensen's avatar
      libbpf: Support batch_size option to bpf_prog_test_run · 24592ad1
      Toke Høiland-Jørgensen authored
      Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN
      to libbpf; just add it as an option and pass it through to the kernel.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220309105346.100053-4-toke@redhat.com
      24592ad1
    • Toke Høiland-Jørgensen's avatar
      Documentation/bpf: Add documentation for BPF_PROG_RUN · 1a7551f1
      Toke Høiland-Jørgensen authored
      This adds documentation for the BPF_PROG_RUN command; a short overview of
      the command itself, and a more verbose description of the "live packet"
      mode for XDP introduced in the previous commit.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220309105346.100053-3-toke@redhat.com
      1a7551f1
    • Toke Høiland-Jørgensen's avatar
      bpf: Add "live packet" mode for XDP in BPF_PROG_RUN · b530e9e1
      Toke Høiland-Jørgensen authored
      This adds support for running XDP programs through BPF_PROG_RUN in a mode
      that enables live packet processing of the resulting frames. Previous uses
      of BPF_PROG_RUN for XDP returned the XDP program return code and the
      modified packet data to userspace, which is useful for unit testing of XDP
      programs.
      
      The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
      ifindex and RXQ number as part of the context object being passed to the
      kernel. This patch reuses that code, but adds a new mode with different
      semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
      flag.
      
      When running BPF_PROG_RUN in this mode, the XDP program return codes will
      be honoured: returning XDP_PASS will result in the frame being injected
      into the networking stack as if it came from the selected networking
      interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
      being transmitted out that interface. XDP_TX is translated into an
      XDP_REDIRECT operation to the same interface, since the real XDP_TX action
      is only possible from within the network drivers themselves, not from the
      process context where BPF_PROG_RUN is executed.
      
      Internally, this new mode of operation creates a page pool instance while
      setting up the test run, and feeds pages from that into the XDP program.
      The setup cost of this is amortised over the number of repetitions
      specified by userspace.
      
      To support the performance testing use case, we further optimise the setup
      step so that all pages in the pool are pre-initialised with the packet
      data, and pre-computed context and xdp_frame objects stored at the start of
      each page. This makes it possible to entirely avoid touching the page
      content on each XDP program invocation, and enables sending up to 9
      Mpps/core on my test box.
      
      Because the data pages are recycled by the page pool, and the test runner
      doesn't re-initialise them for each run, subsequent invocations of the XDP
      program will see the packet data in the state it was after the last time it
      ran on that particular page. This means that an XDP program that modifies
      the packet before redirecting it has to be careful about which assumptions
      it makes about the packet content, but that is only an issue for the most
      naively written programs.
      
      Enabling the new flag is only allowed when not setting ctx_out and data_out
      in the test specification, since using it means frames will be redirected
      somewhere else, so they can't be returned.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
      b530e9e1
    • Andrii Nakryiko's avatar
      Merge branch 'BPF test_progs tests improvement' · 3399dd9f
      Andrii Nakryiko authored
      Mykola Lysenko says:
      
      ====================
      
      First patch reduces the sample_freq to 1000 to ensure test will
      work even when kernel.perf_event_max_sample_rate was reduced to 1000.
      
      Patches for send_signal and find_vma tune the test implementation to
      make sure needed thread is scheduled. Also, both tests will finish as
      soon as possible after the test condition is met.
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      3399dd9f
    • Mykola Lysenko's avatar
      Improve stability of find_vma BPF test · ba83af05
      Mykola Lysenko authored
      Remove unneeded spleep and increase length of dummy CPU
      intensive computation to guarantee test process execution.
      Also, complete aforemention computation as soon as
      test success criteria is met
      Signed-off-by: default avatarMykola Lysenko <mykolal@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220308200449.1757478-4-mykolal@fb.com
      ba83af05
    • Mykola Lysenko's avatar
      Improve send_signal BPF test stability · 1fd49864
      Mykola Lysenko authored
      Substitute sleep with dummy CPU intensive computation.
      Finish aforemention computation as soon as signal was
      delivered to the test process. Make the BPF code to
      only execute when PID global variable is set
      Signed-off-by: default avatarMykola Lysenko <mykolal@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220308200449.1757478-3-mykolal@fb.com
      1fd49864
    • Mykola Lysenko's avatar
      Improve perf related BPF tests (sample_freq issue) · d4b54054
      Mykola Lysenko authored
      Linux kernel may automatically reduce kernel.perf_event_max_sample_rate
      value when running tests in parallel on slow systems. Linux kernel checks
      against this limit when opening perf event with freq=1 parameter set.
      The lower bound is 1000. This patch reduces sample_freq value to 1000
      in all BPF tests that use sample_freq to ensure they always can open
      perf event.
      Signed-off-by: default avatarMykola Lysenko <mykolal@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220308200449.1757478-2-mykolal@fb.com
      d4b54054
  3. 08 Mar, 2022 9 commits
  4. 06 Mar, 2022 5 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: add __percpu tagging in vmlinux BTF' · c344b9fc
      Alexei Starovoitov authored
      Hao Luo says:
      
      ====================
      
      This patchset is very much similar to Yonghong's patchset on adding
      __user tagging [1], where a "user" btf_type_tag was introduced to
      describe __user memory pointers. Similar approach can be applied on
      __percpu pointers. The __percpu attribute in kernel is used to identify
      pointers that point to memory allocated in percpu region. Normally,
      accessing __percpu memory requires using special functions like
      per_cpu_ptr() etc. Directly accessing __percpu pointer is meaningless.
      
      Currently vmlinux BTF does not have a way to differentiate a __percpu
      pointer from a regular pointer. So BPF programs are allowed to load
      __percpu memory directly, which is an incorrect behavior.
      
      With the previous work that encodes __user information in BTF, a nice
      framework has been set up to allow us to encode __percpu information in
      BTF and let the verifier to reject programs that try to directly access
      percpu pointer. Previously, there is a PTR_TO_PERCPU_BTF_ID reg type which
      is used to represent those percpu static variables in the kernel. Pahole
      is able to collect variables that are stored in ".data..percpu" section
      in the kernel image and emit BTF information for those variables. The
      bpf_per_cpu_ptr() and bpf_this_cpu_ptr() helper functions were added to
      access these variables. Now with __percpu information, we can tag those
      __percpu fields in a struct (such as cgroup->rstat_cpu) and allow the
      pair of bpf percpu helpers to access them as well.
      
      In addition to adding __percpu tagging, this patchset also fixes a
      harmless bug in the previous patch that introduced __user. Patch 01/04
      is for that. Patch 02/04 adds the new attribute "percpu". Patch 03/04
      adds MEM_PERCPU tag for PTR_TO_BTF_ID and replaces PTR_TO_PERCPU_BTF_ID
      with (BTF_ID | MEM_PERCPU). Patch 04/04 refactors the btf_tag test a bit
      and adds tests for percpu tag.
      
      Like [1], the minimal requirements for btf_type_tag is
      clang (>= clang14) and pahole (>= 1.23).
      
      [1] https://lore.kernel.org/bpf/20211220015110.3rqxk5qwub3pa2gh@ast-mbp.dhcp.thefacebook.com/t/
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c344b9fc
    • Hao Luo's avatar
      selftests/bpf: Add a test for btf_type_tag "percpu" · 50c6b8a9
      Hao Luo authored
      Add test for percpu btf_type_tag. Similar to the "user" tag, we test
      the following cases:
      
       1. __percpu struct field.
       2. __percpu as function parameter.
       3. per_cpu_ptr() accepts dynamically allocated __percpu memory.
      
      Because the test for "user" and the test for "percpu" are very similar,
      a little bit of refactoring has been done in btf_tag.c. Basically, both
      tests share the same function for loading vmlinux and module btf.
      
      Example output from log:
      
       > ./test_progs -v -t btf_tag
      
       libbpf: prog 'test_percpu1': BPF program load failed: Permission denied
       libbpf: prog 'test_percpu1': -- BEGIN PROG LOAD LOG --
       ...
       ; g = arg->a;
       1: (61) r1 = *(u32 *)(r1 +0)
       R1 is ptr_bpf_testmod_btf_type_tag_1 access percpu memory: off=0
       ...
       test_btf_type_tag_mod_percpu:PASS:btf_type_tag_percpu 0 nsec
       #26/6 btf_tag/btf_type_tag_percpu_mod1:OK
      
       libbpf: prog 'test_percpu2': BPF program load failed: Permission denied
       libbpf: prog 'test_percpu2': -- BEGIN PROG LOAD LOG --
       ...
       ; g = arg->p->a;
       2: (61) r1 = *(u32 *)(r1 +0)
       R1 is ptr_bpf_testmod_btf_type_tag_1 access percpu memory: off=0
       ...
       test_btf_type_tag_mod_percpu:PASS:btf_type_tag_percpu 0 nsec
       #26/7 btf_tag/btf_type_tag_percpu_mod2:OK
      
       libbpf: prog 'test_percpu_load': BPF program load failed: Permission denied
       libbpf: prog 'test_percpu_load': -- BEGIN PROG LOAD LOG --
       ...
       ; g = (__u64)cgrp->rstat_cpu->updated_children;
       2: (79) r1 = *(u64 *)(r1 +48)
       R1 is ptr_cgroup_rstat_cpu access percpu memory: off=48
       ...
       test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu_load 0 nsec
       #26/8 btf_tag/btf_type_tag_percpu_vmlinux_load:OK
      
       load_btfs:PASS:could not load vmlinux BTF 0 nsec
       test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu 0 nsec
       test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu_helper 0 nsec
       #26/9 btf_tag/btf_type_tag_percpu_vmlinux_helper:OK
      Signed-off-by: default avatarHao Luo <haoluo@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220304191657.981240-5-haoluo@google.com
      50c6b8a9
    • Hao Luo's avatar
      bpf: Reject programs that try to load __percpu memory. · 5844101a
      Hao Luo authored
      With the introduction of the btf_type_tag "percpu", we can add a
      MEM_PERCPU to identify those pointers that point to percpu memory.
      The ability of differetiating percpu pointers from regular memory
      pointers have two benefits:
      
       1. It forbids unexpected use of percpu pointers, such as direct loads.
          In kernel, there are special functions used for accessing percpu
          memory. Directly loading percpu memory is meaningless. We already
          have BPF helpers like bpf_per_cpu_ptr() and bpf_this_cpu_ptr() that
          wrap the kernel percpu functions. So we can now convert percpu
          pointers into regular pointers in a safe way.
      
       2. Previously, bpf_per_cpu_ptr() and bpf_this_cpu_ptr() only work on
          PTR_TO_PERCPU_BTF_ID, a special reg_type which describes static
          percpu variables in kernel (we rely on pahole to encode them into
          vmlinux BTF). Now, since we can identify __percpu tagged pointers,
          we can also identify dynamically allocated percpu memory as well.
          It means we can use bpf_xxx_cpu_ptr() on dynamic percpu memory.
          This would be very convenient when accessing fields like
          "cgroup->rstat_cpu".
      Signed-off-by: default avatarHao Luo <haoluo@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220304191657.981240-4-haoluo@google.com
      5844101a
    • Hao Luo's avatar
      compiler_types: Define __percpu as __attribute__((btf_type_tag("percpu"))) · 9216c916
      Hao Luo authored
      This is similar to commit 7472d5a6 ("compiler_types: define __user as
      __attribute__((btf_type_tag("user")))"), where a type tag "user" was
      introduced to identify the pointers that point to user memory. With that
      change, the newest compile toolchain can encode __user information into
      vmlinux BTF, which can be used by the BPF verifier to enforce safe
      program behaviors.
      
      Similarly, we have __percpu attribute, which is mainly used to indicate
      memory is allocated in percpu region. The __percpu pointers in kernel
      are supposed to be used together with functions like per_cpu_ptr() and
      this_cpu_ptr(), which perform necessary calculation on the pointer's
      base address. Without the btf_type_tag introduced in this patch,
      __percpu pointers will be treated as regular memory pointers in vmlinux
      BTF and BPF programs are allowed to directly dereference them, generating
      incorrect behaviors. Now with "percpu" btf_type_tag, the BPF verifier is
      able to differentiate __percpu pointers from regular pointers and forbids
      unexpected behaviors like direct load.
      
      The following is an example similar to the one given in commit
      7472d5a6:
      
        [$ ~] cat test.c
        #define __percpu __attribute__((btf_type_tag("percpu")))
        int foo(int __percpu *arg) {
        	return *arg;
        }
        [$ ~] clang -O2 -g -c test.c
        [$ ~] pahole -JV test.o
        ...
        File test.o:
        [1] INT int size=4 nr_bits=32 encoding=SIGNED
        [2] TYPE_TAG percpu type_id=1
        [3] PTR (anon) type_id=2
        [4] FUNC_PROTO (anon) return=1 args=(3 arg)
        [5] FUNC foo type_id=4
        [$ ~]
      
      for the function argument "int __percpu *arg", its type is described as
      	PTR -> TYPE_TAG(percpu) -> INT
      The kernel can use this information for bpf verification or other
      use cases.
      
      Like commit 7472d5a6, this feature requires clang (>= clang14) and
      pahole (>= 1.23).
      Signed-off-by: default avatarHao Luo <haoluo@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220304191657.981240-3-haoluo@google.com
      9216c916
    • Hao Luo's avatar
      bpf: Fix checking PTR_TO_BTF_ID in check_mem_access · bff61f6f
      Hao Luo authored
      With the introduction of MEM_USER in
      
       commit c6f1bfe8 ("bpf: reject program if a __user tagged memory accessed in kernel way")
      
      PTR_TO_BTF_ID can be combined with a MEM_USER tag. Therefore, most
      likely, when we compare reg_type against PTR_TO_BTF_ID, we want to use
      the reg's base_type. Previously the check in check_mem_access() wants
      to say: if the reg is BTF_ID but not NULL, the execution flow falls
      into the 'then' branch. But now a reg of (BTF_ID | MEM_USER), which
      should go into the 'then' branch, goes into the 'else'.
      
      The end results before and after this patch are the same: regs tagged
      with MEM_USER get rejected, but not in a way we intended. So fix the
      condition, the error message now is correct.
      
      Before (log from commit 696c3901):
      
        $ ./test_progs -v -n 22/3
        ...
        libbpf: prog 'test_user1': BPF program load failed: Permission denied
        libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG --
        R1 type=ctx expected=fp
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg)
        0: (79) r1 = *(u64 *)(r1 +0)
        func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 136561 type STRUCT 'bpf_testmod_btf_type_tag_1'
        1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,off=0,imm=0)
        ; g = arg->a;
        1: (61) r1 = *(u32 *)(r1 +0)
        R1 invalid mem access 'user_ptr_'
      
      Now:
      
        libbpf: prog 'test_user1': BPF program load failed: Permission denied
        libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG --
        R1 type=ctx expected=fp
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg)
        0: (79) r1 = *(u64 *)(r1 +0)
        func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 104036 type STRUCT 'bpf_testmod_btf_type_tag_1'
        1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,ref_obj_id=0,off=0,imm=0)
        ; g = arg->a;
        1: (61) r1 = *(u32 *)(r1 +0)
        R1 is ptr_bpf_testmod_btf_type_tag_1 access user memory: off=0
      
      Note the error message for the reason of rejection.
      
      Fixes: c6f1bfe8 ("bpf: reject program if a __user tagged memory accessed in kernel way")
      Signed-off-by: default avatarHao Luo <haoluo@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220304191657.981240-2-haoluo@google.com
      bff61f6f
  5. 05 Mar, 2022 13 commits
    • Alexei Starovoitov's avatar
      Merge branch 'Fixes for bad PTR_TO_BTF_ID offset' · 401af75c
      Alexei Starovoitov authored
      Kumar Kartikeya Dwivedi says:
      
      ====================
      
      This set fixes a bug related to bad var_off being permitted for kfunc call in
      case of PTR_TO_BTF_ID, consolidates offset checks for all register types allowed
      as helper or kfunc arguments into a common shared helper, and introduces a
      couple of other checks to harden the kfunc release logic and prevent future
      bugs. Some selftests are also included that fail in absence of these fixes,
      serving as demonstration of the issues being fixed.
      
      Changelog:
      ----------
      v3 -> v4:
      v3: https://lore.kernel.org/bpf/20220304000508.2904128-1-memxor@gmail.com
      
       * Update commit message for __diag patch to say clang instead of LLVM (Nathan)
       * Address nits for check_func_arg_reg_off (Martin)
       * Add comment for fixed_off_ok case, remove is_kfunc check (Martin)
      
      v2 -> v3:
      v2: https://lore.kernel.org/bpf/20220303045029.2645297-1-memxor@gmail.com
      
       * Add my SoB to __diag for clang patch (Nathan)
      
      v1 -> v2:
      v1: https://lore.kernel.org/bpf/20220301065745.1634848-1-memxor@gmail.com
      
       * Put reg->off check for release kfunc inside check_func_arg_reg_off,
         make the check a bit more readable
       * Squash verifier selftests errstr update into patch 3 for bisect (Alexei)
       * Include fix from Nathan for clang warning about missing prototypes
       * Add unified __diag_ingore_all that works for both GCC/LLVM (Alexei)
      
      Older discussion:
      Link: https://lore.kernel.org/bpf/20220219113744.1852259-1-memxor@gmail.com
      
      Kumar Kartikeya Dwivedi (7):
        bpf: Add check_func_arg_reg_off function
        bpf: Fix PTR_TO_BTF_ID var_off check
        bpf: Disallow negative offset in check_ptr_off_reg
        bpf: Harden register offset checks for release helpers and kfuncs
        compiler_types.h: Add unified __diag_ignore_all for GCC/LLVM
        bpf: Replace __diag_ignore with unified __diag_ignore_all
        selftests/bpf: Add tests for kfunc register offset checks
      ====================
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      401af75c
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add tests for kfunc register offset checks · 8218ccb5
      Kumar Kartikeya Dwivedi authored
      Include a few verifier selftests that test against the problems being
      fixed by previous commits, i.e. release kfunc always require
      PTR_TO_BTF_ID fixed and var_off to be 0, and negative offset is not
      permitted and returns a helpful error message.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-9-memxor@gmail.com
      8218ccb5
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Replace __diag_ignore with unified __diag_ignore_all · 0b206c6d
      Kumar Kartikeya Dwivedi authored
      Currently, -Wmissing-prototypes warning is ignored for GCC, but not
      clang. This leads to clang build warning in W=1 mode. Since the flag
      used by both compilers is same, we can use the unified __diag_ignore_all
      macro that works for all supported versions and compilers which have
      __diag macro support (currently GCC >= 8.0, and Clang >= 11.0).
      
      Also add nf_conntrack_bpf.h include to prevent missing prototype warning
      for register_nf_conntrack_bpf.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-8-memxor@gmail.com
      0b206c6d
    • Kumar Kartikeya Dwivedi's avatar
      compiler_types.h: Add unified __diag_ignore_all for GCC/LLVM · 4d1ea705
      Kumar Kartikeya Dwivedi authored
      Add a __diag_ignore_all macro, to ignore warnings for both GCC and LLVM,
      without having to specify the compiler type and version. By default, GCC
      8 and clang 11 are used. This will be used by bpf subsystem to ignore
      -Wmissing-prototypes warning for functions that are meant to be global
      functions so that they are in vmlinux BTF, but don't have a prototype.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-7-memxor@gmail.com
      4d1ea705
    • Nathan Chancellor's avatar
      compiler-clang.h: Add __diag infrastructure for clang · f014a00b
      Nathan Chancellor authored
      Add __diag macros similar to those in compiler-gcc.h, so that warnings
      that need to be adjusted for specific cases but not globally can be
      ignored when building with clang.
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-6-memxor@gmail.com
      
      [ Kartikeya: wrote commit message ]
      f014a00b
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Harden register offset checks for release helpers and kfuncs · 24d5bb80
      Kumar Kartikeya Dwivedi authored
      Let's ensure that the PTR_TO_BTF_ID reg being passed in to release BPF
      helpers and kfuncs always has its offset set to 0. While not a real
      problem now, there's a very real possibility this will become a problem
      when more and more kfuncs are exposed, and more BPF helpers are added
      which can release PTR_TO_BTF_ID.
      
      Previous commits already protected against non-zero var_off. One of the
      case we are concerned about now is when we have a type that can be
      returned by e.g. an acquire kfunc:
      
      struct foo {
      	int a;
      	int b;
      	struct bar b;
      };
      
      ... and struct bar is also a type that can be returned by another
      acquire kfunc.
      
      Then, doing the following sequence:
      
      	struct foo *f = bpf_get_foo(); // acquire kfunc
      	if (!f)
      		return 0;
      	bpf_put_bar(&f->b); // release kfunc
      
      ... would work with the current code, since the btf_struct_ids_match
      takes reg->off into account for matching pointer type with release kfunc
      argument type, but would obviously be incorrect, and most likely lead to
      a kernel crash. A test has been included later to prevent regressions in
      this area.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-5-memxor@gmail.com
      24d5bb80
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Disallow negative offset in check_ptr_off_reg · e1fad0ff
      Kumar Kartikeya Dwivedi authored
      check_ptr_off_reg only allows fixed offset to be set for PTR_TO_BTF_ID,
      where reg->off < 0 doesn't make sense. This would shift the pointer
      backwards, and fails later in btf_struct_ids_match or btf_struct_walk
      due to out of bounds access (since offset is interpreted as unsigned).
      
      Improve the verifier by rejecting this case by using a better error
      message for BPF helpers and kfunc, by putting a check inside the
      check_func_arg_reg_off function.
      
      Also, update existing verifier selftests to work with new error string.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-4-memxor@gmail.com
      e1fad0ff
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix PTR_TO_BTF_ID var_off check · 655efe50
      Kumar Kartikeya Dwivedi authored
      When kfunc support was added, check_ctx_reg was called for PTR_TO_CTX
      register, but no offset checks were made for PTR_TO_BTF_ID. Only
      reg->off was taken into account by btf_struct_ids_match, which protected
      against type mismatch due to non-zero reg->off, but when reg->off was
      zero, a user could set the variable offset of the register and allow it
      to be passed to kfunc, leading to bad pointer being passed into the
      kernel.
      
      Fix this by reusing the extracted helper check_func_arg_reg_off from
      previous commit, and make one call before checking all supported
      register types. Since the list is maintained, any future changes will be
      taken into account by updating check_func_arg_reg_off. This function
      prevents non-zero var_off to be set for PTR_TO_BTF_ID, but still allows
      a fixed non-zero reg->off, which is needed for type matching to work
      correctly when using pointer arithmetic.
      
      ARG_DONTCARE is passed as arg_type, since kfunc doesn't support
      accepting a ARG_PTR_TO_ALLOC_MEM without relying on size of parameter
      type from BTF (in case of pointer), or using a mem, len pair. The
      forcing of offset check for ARG_PTR_TO_ALLOC_MEM is done because ringbuf
      helpers obtain the size from the header located at the beginning of the
      memory region, hence any changes to the original pointer shouldn't be
      allowed. In case of kfunc, size is always known, either at verification
      time, or using the length parameter, hence this forcing is not required.
      
      Since this check will happen once already for PTR_TO_CTX, remove the
      check_ptr_off_reg call inside its block.
      
      Fixes: e6ac2450 ("bpf: Support bpf program calling kernel function")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-3-memxor@gmail.com
      655efe50
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Add check_func_arg_reg_off function · 25b35dd2
      Kumar Kartikeya Dwivedi authored
      Lift the list of register types allowed for having fixed and variable
      offsets when passed as helper function arguments into a common helper,
      so that they can be reused for kfunc checks in later commits. Keeping a
      common helper aids maintainability and allows us to follow the same
      consistent rules across helpers and kfuncs. Also, convert check_func_arg
      to use this function.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220304224645.3677453-2-memxor@gmail.com
      25b35dd2
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf: support custom SEC() handlers' · caec5495
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Add ability for user applications and libraries to register custom BPF program
      SEC() handlers. See patch #2 for examples where this is useful.
      
      Patch #1 does some preliminary refactoring to allow exponsing program
      init, preload, and attach callbacks as public API. It also establishes
      a protocol to allow optional auto-attach behavior. This will also help the
      case of sometimes auto-attachable uprobes.
      
      v4->v5:
        - API documentation improvements (Daniel);
      v3->v4:
        - init_fn -> prog_setup_fn, preload_fn -> prog_prepare_load_fn (Alexei);
      v2->v3:
        - moved callbacks and cookie into OPTS struct (Alan);
        - added more test scenarios (Alan);
        - address most of Alan's feedback, but kept API name;
      v1->v2:
        - resubmitting due to git send-email screw up.
      
      Cc: Alan Maguire <alan.maguire@oracle.com>
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      caec5495
    • Andrii Nakryiko's avatar
      selftests/bpf: Add custom SEC() handling selftest · aa963bcb
      Andrii Nakryiko authored
      Add a selftest validating various aspects of libbpf's handling of custom
      SEC() handlers. It also demonstrates how libraries can ensure very early
      callbacks registration and unregistration using
      __attribute__((constructor))/__attribute__((destructor)) functions.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20220305010129.1549719-4-andrii@kernel.org
      aa963bcb
    • Andrii Nakryiko's avatar
      libbpf: Support custom SEC() handlers · 697f104d
      Andrii Nakryiko authored
      Allow registering and unregistering custom handlers for BPF program.
      This allows user applications and libraries to plug into libbpf's
      declarative SEC() definition handling logic. This allows to offload
      complex and intricate custom logic into external libraries, but still
      provide a great user experience.
      
      One such example is USDT handling library, which has a lot of code and
      complexity which doesn't make sense to put into libbpf directly, but it
      would be really great for users to be able to specify BPF programs with
      something like SEC("usdt/<path-to-binary>:<usdt_provider>:<usdt_name>")
      and have correct BPF program type set (BPF_PROGRAM_TYPE_KPROBE, as it is
      uprobe) and even support BPF skeleton's auto-attach logic.
      
      In some cases, it might be even good idea to override libbpf's default
      handling, like for SEC("perf_event") programs. With custom library, it's
      possible to extend logic to support specifying perf event specification
      right there in SEC() definition without burdening libbpf with lots of
      custom logic or extra library dependecies (e.g., libpfm4). With current
      patch it's possible to override libbpf's SEC("perf_event") handling and
      specify a completely custom ones.
      
      Further, it's possible to specify a generic fallback handling for any
      SEC() that doesn't match any other custom or standard libbpf handlers.
      This allows to accommodate whatever legacy use cases there might be, if
      necessary.
      
      See doc comments for libbpf_register_prog_handler() and
      libbpf_unregister_prog_handler() for detailed semantics.
      
      This patch also bumps libbpf development version to v0.8 and adds new
      APIs there.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20220305010129.1549719-3-andrii@kernel.org
      697f104d
    • Andrii Nakryiko's avatar
      libbpf: Allow BPF program auto-attach handlers to bail out · 4fa5bcfe
      Andrii Nakryiko authored
      Allow some BPF program types to support auto-attach only in subste of
      cases. Currently, if some BPF program type specifies attach callback, it
      is assumed that during skeleton attach operation all such programs
      either successfully attach or entire skeleton attachment fails. If some
      program doesn't support auto-attachment from skeleton, such BPF program
      types shouldn't have attach callback specified.
      
      This is limiting for cases when, depending on how full the SEC("")
      definition is, there could either be enough details to support
      auto-attach or there might not be and user has to use some specific API
      to provide more details at runtime.
      
      One specific example of such desired behavior might be SEC("uprobe"). If
      it's specified as just uprobe auto-attach isn't possible. But if it's
      SEC("uprobe/<some_binary>:<some_func>") then there are enough details to
      support auto-attach. Note that there is a somewhat subtle difference
      between auto-attach behavior of BPF skeleton and using "generic"
      bpf_program__attach(prog) (which uses the same attach handlers under the
      cover). Skeleton allow some programs within bpf_object to not have
      auto-attach implemented and doesn't treat that as an error. Instead such
      BPF programs are just skipped during skeleton's (optional) attach step.
      bpf_program__attach(), on the other hand, is called when user *expects*
      auto-attach to work, so if specified program doesn't implement or
      doesn't support auto-attach functionality, that will be treated as an
      error.
      
      Another improvement to the way libbpf is handling SEC()s would be to not
      require providing dummy kernel function name for kprobe. Currently,
      SEC("kprobe/whatever") is necessary even if actual kernel function is
      determined by user at runtime and bpf_program__attach_kprobe() is used
      to specify it. With changes in this patch, it's possible to support both
      SEC("kprobe") and SEC("kprobe/<actual_kernel_function"), while only in
      the latter case auto-attach will be performed. In the former one, such
      kprobe will be skipped during skeleton attach operation.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20220305010129.1549719-2-andrii@kernel.org
      4fa5bcfe