1. 29 Jan, 2021 2 commits
  2. 28 Jan, 2021 2 commits
  3. 27 Jan, 2021 2 commits
  4. 26 Jan, 2021 1 commit
  5. 25 Jan, 2021 17 commits
  6. 22 Jan, 2021 3 commits
  7. 21 Jan, 2021 4 commits
  8. 20 Jan, 2021 9 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: misc performance improvements for cgroup' · 636d549f
      Alexei Starovoitov authored
      Stanislav Fomichev says:
      
      ====================
      
      First patch adds custom getsockopt for TCP_ZEROCOPY_RECEIVE
      to remove kmalloc and lock_sock overhead from the dat path.
      
      Second patch removes kzalloc/kfree from getsockopt for the common cases.
      
      Third patch switches cgroup_bpf_enabled to be per-attach to
      to add only overhead for the cgroup attach types used on the system.
      
      No visible user-side changes.
      
      v9:
      - include linux/tcp.h instead of netinet/tcp.h in sockopt_sk.c
      - note that v9 depends on the commit 4be34f3d ("bpf: Don't leak
        memory in bpf getsockopt when optlen == 0") from bpf tree
      
      v8:
      - add bpi.h to tools/include/uapi in the same patch (Martin KaFai Lau)
      - kmalloc instead of kzalloc when exporting buffer (Martin KaFai Lau)
      - note that v8 depends on the commit 4be34f3d ("bpf: Don't leak
        memory in bpf getsockopt when optlen == 0") from bpf tree
      
      v7:
      - add comment about buffer contents for retval != 0 (Martin KaFai Lau)
      - export tcp.h into tools/include/uapi (Martin KaFai Lau)
      - note that v7 depends on the commit 4be34f3d ("bpf: Don't leak
        memory in bpf getsockopt when optlen == 0") from bpf tree
      
      v6:
      - avoid indirect cost for new bpf_bypass_getsockopt (Eric Dumazet)
      
      v5:
      - reorder patches to reduce the churn (Martin KaFai Lau)
      
      v4:
      - update performance numbers
      - bypass_bpf_getsockopt (Martin KaFai Lau)
      
      v3:
      - remove extra newline, add comment about sizeof tcp_zerocopy_receive
        (Martin KaFai Lau)
      - add another patch to remove lock_sock overhead from
        TCP_ZEROCOPY_RECEIVE; technically, this makes patch #1 obsolete,
        but I'd still prefer to keep it to help with other socket
        options
      
      v2:
      - perf numbers for getsockopt kmalloc reduction (Song Liu)
      - (sk) in BPF_CGROUP_PRE_CONNECT_ENABLED (Song Liu)
      - 128 -> 64 buffer size, BUILD_BUG_ON (Martin KaFai Lau)
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      636d549f
    • Stanislav Fomichev's avatar
      bpf: Split cgroup_bpf_enabled per attach type · a9ed15da
      Stanislav Fomichev authored
      When we attach any cgroup hook, the rest (even if unused/unattached) start
      to contribute small overhead. In particular, the one we want to avoid is
      __cgroup_bpf_run_filter_skb which does two redirections to get to
      the cgroup and pushes/pulls skb.
      
      Let's split cgroup_bpf_enabled to be per-attach to make sure
      only used attach types trigger.
      
      I've dropped some existing high-level cgroup_bpf_enabled in some
      places because BPF_PROG_CGROUP_XXX_RUN macros usually have another
      cgroup_bpf_enabled check.
      
      I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for
      GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type]
      has to be constant and known at compile time.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com
      a9ed15da
    • Stanislav Fomichev's avatar
      bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt · 20f2505f
      Stanislav Fomichev authored
      When we attach a bpf program to cgroup/getsockopt any other getsockopt()
      syscall starts incurring kzalloc/kfree cost.
      
      Let add a small buffer on the stack and use it for small (majority)
      {s,g}etsockopt values. The buffer is small enough to fit into
      the cache line and cover the majority of simple options (most
      of them are 4 byte ints).
      
      It seems natural to do the same for setsockopt, but it's a bit more
      involved when the BPF program modifies the data (where we have to
      kmalloc). The assumption is that for the majority of setsockopt
      calls (which are doing pure BPF options or apply policy) this
      will bring some benefit as well.
      
      Without this patch (we remove about 1% __kmalloc):
           3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
                  |
                   --3.30%--__cgroup_bpf_run_filter_getsockopt
                             |
                              --0.81%--__kmalloc
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210115163501.805133-3-sdf@google.com
      20f2505f
    • Stanislav Fomichev's avatar
      bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE · 9cacf81f
      Stanislav Fomichev authored
      Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
      We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
      call in do_tcp_getsockopt using the on-stack data. This removes
      3% overhead for locking/unlocking the socket.
      
      Without this patch:
           3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
                  |
                   --3.30%--__cgroup_bpf_run_filter_getsockopt
                             |
                              --0.81%--__kmalloc
      
      With the patch applied:
           0.52%     0.12%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt_kern
      
      Note, exporting uapi/tcp.h requires removing netinet/tcp.h
      from test_progs.h because those headers have confliciting
      definitions.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com
      9cacf81f
    • Yonghong Song's avatar
      bpf: Permit size-0 datasec · 13ca51d5
      Yonghong Song authored
      llvm patch https://reviews.llvm.org/D84002 permitted
      to emit empty rodata datasec if the elf .rodata section
      contains read-only data from local variables. These
      local variables will be not emitted as BTF_KIND_VARs
      since llvm converted these local variables as
      static variables with private linkage without debuginfo
      types. Such an empty rodata datasec will make
      skeleton code generation easy since for skeleton
      a rodata struct will be generated if there is a
      .rodata elf section. The existence of a rodata
      btf datasec is also consistent with the existence
      of a rodata map created by libbpf.
      
      The btf with such an empty rodata datasec will fail
      in the kernel though as kernel will reject a datasec
      with zero vlen and zero size. For example, for the below code,
          int sys_enter(void *ctx)
          {
             int fmt[6] = {1, 2, 3, 4, 5, 6};
             int dst[6];
      
             bpf_probe_read(dst, sizeof(dst), fmt);
             return 0;
          }
      We got the below btf (bpftool btf dump ./test.o):
          [1] PTR '(anon)' type_id=0
          [2] FUNC_PROTO '(anon)' ret_type_id=3 vlen=1
                  'ctx' type_id=1
          [3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
          [4] FUNC 'sys_enter' type_id=2 linkage=global
          [5] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
          [6] ARRAY '(anon)' type_id=5 index_type_id=7 nr_elems=4
          [7] INT '__ARRAY_SIZE_TYPE__' size=4 bits_offset=0 nr_bits=32 encoding=(none)
          [8] VAR '_license' type_id=6, linkage=global-alloc
          [9] DATASEC '.rodata' size=0 vlen=0
          [10] DATASEC 'license' size=0 vlen=1
                  type_id=8 offset=0 size=4
      When loading the ./test.o to the kernel with bpftool,
      we see the following error:
          libbpf: Error loading BTF: Invalid argument(22)
          libbpf: magic: 0xeb9f
          ...
          [6] ARRAY (anon) type_id=5 index_type_id=7 nr_elems=4
          [7] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
          [8] VAR _license type_id=6 linkage=1
          [9] DATASEC .rodata size=24 vlen=0 vlen == 0
          libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring.
      
      Basically, libbpf changed .rodata datasec size to 24 since elf .rodata
      section size is 24. The kernel then rejected the BTF since vlen = 0.
      Note that the above kernel verifier failure can be worked around with
      changing local variable "fmt" to a static or global, optionally const, variable.
      
      This patch permits a datasec with vlen = 0 in kernel.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210119153519.3901963-1-yhs@fb.com
      13ca51d5
    • Alexei Starovoitov's avatar
      Merge branch 'Allow attaching to bare tracepoints' · 71ee10e2
      Alexei Starovoitov authored
      Qais Yousef says:
      
      ====================
      
      Changes in v3:
      	* Fix not returning error value correctly in
      	  trigger_module_test_write() (Yonghong)
      	* Add Yonghong acked-by to patch 1.
      
      Changes in v2:
      	* Fix compilation error. (Andrii)
      	* Make the new test use write() instead of read() (Andrii)
      
      Add some missing glue logic to teach bpf about bare tracepoints - tracepoints
      without any trace event associated with them.
      
      Bare tracepoints are declare with DECLARE_TRACE(). Full tracepoints are declare
      with TRACE_EVENT().
      
      BPF can attach to these tracepoints as RAW_TRACEPOINT() only as there're no
      events in tracefs created with them.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      71ee10e2
    • Qais Yousef's avatar
      selftests: bpf: Add a new test for bare tracepoints · 407be922
      Qais Yousef authored
      Reuse module_attach infrastructure to add a new bare tracepoint to check
      we can attach to it as a raw tracepoint.
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210119122237.2426878-3-qais.yousef@arm.com
      407be922
    • Alexei Starovoitov's avatar
      Merge branch 'bpf,x64: implement jump padding in jit' · 86e6b4e9
      Alexei Starovoitov authored
      Gary Lin says:
      
      ====================
      This patch series implements jump padding to x64 jit to cover some
      corner cases that used to consume more than 20 jit passes and caused
      failure.
      
      v4:
        - Add the detailed comments about the possible padding bytes
        - Add the second test case which triggers jmp_cond padding and imm32 nop
          jmp padding.
        - Add the new test case as another subprog
      
      v3:
        - Copy the instructions of prologue separately or the size calculation
          of the first BPF instruction would include the prologue.
        - Replace WARN_ONCE() with pr_err() and EFAULT
        - Use MAX_PASSES in the for loop condition check
        - Remove the "padded" flag from x64_jit_data. For the extra pass of
          subprogs, padding is always enabled since it won't hurt the images
          that converge without padding.
      v2:
        - Simplify the sample code in the commit description and provide the
          jit code
        - Check the expected padding bytes with WARN_ONCE
        - Move the 'padded' flag to 'struct x64_jit_data'
        - Remove the EXPECTED_FAIL flag from bpf_fill_maxinsns11() in test_bpf
        - Add 2 verifier tests
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      86e6b4e9
    • Qais Yousef's avatar
      trace: bpf: Allow bpf to attach to bare tracepoints · 6939f4ef
      Qais Yousef authored
      Some subsystems only have bare tracepoints (a tracepoint with no
      associated trace event) to avoid the problem of trace events being an
      ABI that can't be changed.
      
      >From bpf presepective, bare tracepoints are what it calls
      RAW_TRACEPOINT().
      
      Since bpf assumed there's 1:1 mapping, it relied on hooking to
      DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
      bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
      no knowledge about their existence.
      
      By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
      DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.
      
      Enabling that comes with the contract that changes to raw tracepoints
      don't constitute a regression if they break existing bpf programs.
      We need the ability to continue to morph and modify these raw
      tracepoints without worrying about any ABI.
      
      Update Documentation/bpf/bpf_design_QA.rst to document this contract.
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210119122237.2426878-2-qais.yousef@arm.com
      6939f4ef