1. 05 Apr, 2019 1 commit
    • Alexei Starovoitov's avatar
      samples/bpf: fix build with new clang · 636e78b1
      Alexei Starovoitov authored
      clang started to error on invalid asm clobber usage in x86 headers
      and many bpf program samples failed to build with the message:
      
        CLANG-bpf  /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o
      In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14:
      In file included from ../include/linux/in.h:23:
      In file included from ../include/uapi/linux/in.h:24:
      In file included from ../include/linux/socket.h:8:
      In file included from ../include/linux/uio.h:14:
      In file included from ../include/crypto/hash.h:16:
      In file included from ../include/linux/crypto.h:26:
      In file included from ../include/linux/uaccess.h:5:
      In file included from ../include/linux/sched.h:15:
      In file included from ../include/linux/sem.h:5:
      In file included from ../include/uapi/linux/sem.h:5:
      In file included from ../include/linux/ipc.h:9:
      In file included from ../include/linux/refcount.h:72:
      ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list
                                               r->refs.counter, e, "er", i, "cx");
                                                                            ^
      ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list
                                               r->refs.counter, e, "cx");
                                                                   ^
      2 errors generated.
      
      Override volatile() to workaround the problem.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      636e78b1
  2. 04 Apr, 2019 2 commits
  3. 03 Apr, 2019 11 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-verifier-scalability' · cc441a69
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      v1->v2:
      - fixed typo in patch 1
      - added a patch to convert kcalloc to kvcalloc
      - added a patch to verbose 16-bit jump offset check
      - added a test with 1m insns
      
      This patch set is the first step to be able to accept large programs.
      The verifier still suffers from its brute force algorithm and
      large programs can easily hit 1M insn_processed limit.
      A lot more work is necessary to be able to verify large programs.
      
      v1:
      Realize two key ideas to speed up verification speed by ~20 times
      1. every 'branching' instructions records all verifier states.
         not all of them are useful for search pruning.
         add a simple heuristic to keep states that were successful in search pruning
         and remove those that were not
      2. mark_reg_read walks parentage chain of registers to mark parents as LIVE_READ.
         Once the register is marked there is no need to remark it again in the future.
         Hence stop walking the chain once first LIVE_READ is seen.
      
      1st optimization gives 10x speed up on large programs
      and 2nd optimization reduces the cost of mark_reg_read from ~40% of cpu to <1%.
      Combined the deliver ~20x speedup on large programs.
      
      Faster and bounded verification time allows to increase insn_processed
      limit to 1 million from 130k.
      Worst case it takes 1/10 of a second to process that many instructions
      and peak memory consumption is peak_states * sizeof(struct bpf_verifier_state)
      which is around ~5Mbyte.
      
      Increase insn_per_program limit for root to insn_processed limit.
      
      Add verification stats and stress tests for verifier scalability.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cc441a69
    • Alexei Starovoitov's avatar
      selftests/bpf: synthetic tests to push verifier limits · 8aa2d4b4
      Alexei Starovoitov authored
      Add a test to generate 1m ld_imm64 insns to stress the verifier.
      
      Bump the size of fill_ld_abs_vlan_push_pop test from 4k to 29k
      and jump_around_ld_abs from 4k to 5.5k.
      Larger sizes are not possible due to 16-bit offset encoding
      in jump instructions.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8aa2d4b4
    • Alexei Starovoitov's avatar
      selftests/bpf: add few verifier scale tests · e5e7a8f2
      Alexei Starovoitov authored
      Add 3 basic tests that stress verifier scalability.
      
      test_verif_scale1.c calls non-inlined jhash() function 90 times on
      different position in the packet.
      This test simulates network packet parsing.
      jhash function is ~140 instructions and main program is ~1200 insns.
      
      test_verif_scale2.c force inlines jhash() function 90 times.
      This program is ~15k instructions long.
      
      test_verif_scale3.c calls non-inlined jhash() function 90 times on
      But this time jhash has to process 32-bytes from the packet
      instead of 14-bytes in tests 1 and 2.
      jhash function is ~230 insns and main program is ~1200 insns.
      
      $ test_progs -s
      can be used to see verifier stats.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e5e7a8f2
    • Alexei Starovoitov's avatar
      libbpf: teach libbpf about log_level bit 2 · da11b417
      Alexei Starovoitov authored
      Allow bpf_prog_load_xattr() to specify log_level for program loading.
      
      Teach libbpf to accept log_level with bit 2 set.
      
      Increase default BPF_LOG_BUF_SIZE from 256k to 16M.
      There is no downside to increase it to a maximum allowed by old kernels.
      Existing 256k limit caused ENOSPC errors and users were not able to see
      verifier error which is printed at the end of the verifier log.
      
      If ENOSPC is hit, double the verifier log and try again to capture
      the verifier error.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      da11b417
    • Alexei Starovoitov's avatar
      bpf: increase verifier log limit · 7a9f5c65
      Alexei Starovoitov authored
      The existing 16Mbyte verifier log limit is not enough for log_level=2
      even for small programs. Increase it to 1Gbyte.
      Note it's not a kernel memory limit.
      It's an amount of memory user space provides to store
      the verifier log. The kernel populates it 1k at a time.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7a9f5c65
    • Alexei Starovoitov's avatar
      bpf: increase complexity limit and maximum program size · c04c0d2b
      Alexei Starovoitov authored
      Large verifier speed improvements allow to increase
      verifier complexity limit.
      Now regardless of the program composition and its size it takes
      little time for the verifier to hit insn_processed limit.
      On typical x86 machine non-debug kernel processes 1M instructions
      in 1/10 of a second.
      (before these speed improvements specially crafted programs
      could be hitting multi-second verification times)
      Full kasan kernel with debug takes ~1 second for the same 1M insns.
      Hence bump the BPF_COMPLEXITY_LIMIT_INSNS limit to 1M.
      Also increase the number of instructions per program
      from 4k to internal BPF_COMPLEXITY_LIMIT_INSNS limit.
      4k limit was confusing to users, since small programs with hundreds
      of insns could be hitting BPF_COMPLEXITY_LIMIT_INSNS limit.
      Sometimes adding more insns and bpf_trace_printk debug statements
      would make the verifier accept the program while removing
      code would make the verifier reject it.
      Some user space application started to add #define MAX_FOO to
      their programs and do:
        MAX_FOO=100;
      again:
        compile with MAX_FOO;
        try to load;
        if (fails_to_load) { reduce MAX_FOO; goto again; }
      to be able to fit maximum amount of processing into single program.
      Other users artificially split their single program into a set of programs
      and use all 32 iterations of tail_calls to increase compute limits.
      And the most advanced folks used unlimited tc-bpf filter list
      to execute many bpf programs.
      Essentially the users managed to workaround 4k insn limit.
      This patch removes the limit for root programs from uapi.
      BPF_COMPLEXITY_LIMIT_INSNS is the kernel internal limit
      and success to load the program no longer depends on program size,
      but on 'smartness' of the verifier only.
      The verifier will continue to get smarter with every kernel release.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c04c0d2b
    • Alexei Starovoitov's avatar
      bpf: verbose jump offset overflow check · 4f73379e
      Alexei Starovoitov authored
      Larger programs may trigger 16-bit jump offset overflow check
      during instruction patching. Make this error verbose otherwise
      users cannot decipher error code without printks in the verifier.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4f73379e
    • Alexei Starovoitov's avatar
      bpf: convert temp arrays to kvcalloc · 71dde681
      Alexei Starovoitov authored
      Temporary arrays used during program verification need to be vmalloc-ed
      to support large bpf programs.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      71dde681
    • Alexei Starovoitov's avatar
      bpf: improve verification speed by not remarking live_read · 25af32da
      Alexei Starovoitov authored
      With large verifier speed improvement brought by the previous patch
      mark_reg_read() becomes the hottest function during verification.
      On a typical program it consumes 40% of cpu.
      mark_reg_read() walks parentage chain of registers to mark parents as LIVE_READ.
      Once the register is marked there is no need to remark it again in the future.
      Hence stop walking the chain once first LIVE_READ is seen.
      This optimization drops mark_reg_read() time from 40% of cpu to <1%
      and overall 2x improvement of verification speed.
      For some programs the longest_mark_read_walk counter improves from ~500 to ~5
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      25af32da
    • Alexei Starovoitov's avatar
      bpf: improve verification speed by droping states · 9f4686c4
      Alexei Starovoitov authored
      Branch instructions, branch targets and calls in a bpf program are
      the places where the verifier remembers states that led to successful
      verification of the program.
      These states are used to prune brute force program analysis.
      For unprivileged programs there is a limit of 64 states per such
      'branching' instructions (maximum length is tracked by max_states_per_insn
      counter introduced in the previous patch).
      Simply reducing this threshold to 32 or lower increases insn_processed
      metric to the point that small valid programs get rejected.
      For root programs there is no limit and cilium programs can have
      max_states_per_insn to be 100 or higher.
      Walking 100+ states multiplied by number of 'branching' insns during
      verification consumes significant amount of cpu time.
      Turned out simple LRU-like mechanism can be used to remove states
      that unlikely will be helpful in future search pruning.
      This patch introduces hit_cnt and miss_cnt counters:
      hit_cnt - this many times this state successfully pruned the search
      miss_cnt - this many times this state was not equivalent to other states
      (and that other states were added to state list)
      
      The heuristic introduced in this patch is:
      if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
        /* drop this state from future considerations */
      
      Higher numbers increase max_states_per_insn (allow more states to be
      considered for pruning) and slow verification speed, but do not meaningfully
      reduce insn_processed metric.
      Lower numbers drop too many states and insn_processed increases too much.
      Many different formulas were considered.
      This one is simple and works well enough in practice.
      (the analysis was done on selftests/progs/* and on cilium programs)
      
      The end result is this heuristic improves verification speed by 10 times.
      Large synthetic programs that used to take a second more now take
      1/10 of a second.
      In cases where max_states_per_insn used to be 100 or more, now it's ~10.
      
      There is a slight increase in insn_processed for cilium progs:
                             before   after
      bpf_lb-DLB_L3.o 	1831	1838
      bpf_lb-DLB_L4.o 	3029	3218
      bpf_lb-DUNKNOWN.o 	1064	1064
      bpf_lxc-DDROP_ALL.o	26309	26935
      bpf_lxc-DUNKNOWN.o	33517	34439
      bpf_netdev.o		9713	9721
      bpf_overlay.o		6184	6184
      bpf_lcx_jit.o		37335	39389
      And 2-3 times improvement in the verification speed.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9f4686c4
    • Alexei Starovoitov's avatar
      bpf: add verifier stats and log_level bit 2 · 06ee7115
      Alexei Starovoitov authored
      In order to understand the verifier bottlenecks add various stats
      and extend log_level:
      log_level 1 and 2 are kept as-is:
      bit 0 - level=1 - print every insn and verifier state at branch points
      bit 1 - level=2 - print every insn and verifier state at every insn
      bit 2 - level=4 - print verifier error and stats at the end of verification
      
      When verifier rejects the program the libbpf is trying to load the program twice.
      Once with log_level=0 (no messages, only error code is reported to user space)
      and second time with log_level=1 to tell the user why the verifier rejected it.
      
      With introduction of bit 2 - level=4 the libbpf can choose to always use that
      level and load programs once, since the verification speed is not affected and
      in case of error the verbose message will be available.
      
      Note that the verifier stats are not part of uapi just like all other
      verbose messages. They're expected to change in the future.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      06ee7115
  4. 02 Apr, 2019 6 commits
  5. 01 Apr, 2019 1 commit
    • Yonghong Song's avatar
      bpf: add bpffs multi-dimensional array tests in test_btf · 9de2640b
      Yonghong Song authored
      For multiple dimensional arrays like below,
        int a[2][3]
      both llvm and pahole generated one BTF_KIND_ARRAY type like
        . element_type: int
        . index_type: unsigned int
        . number of elements: 6
      
      Such a collapsed BTF_KIND_ARRAY type will cause the divergence
      in BTF vs. the user code. In the compile-once-run-everywhere
      project, the header file is generated from BTF and used for bpf
      program, and the definition in the header file will be different
      from what user expects.
      
      But the kernel actually supports chained multi-dimensional array
      types properly. The above "int a[2][3]" can be represented as
        Type #n:
          . element_type: int
          . index_type: unsigned int
          . number of elements: 3
        Type #(n+1):
          . element_type: type #n
          . index_type: unsigned int
          . number of elements: 2
      
      The following llvm commit
        https://reviews.llvm.org/rL357215
      also enables llvm to generated proper chained multi-dimensional arrays.
      
      The test_btf already has a raw test ("struct test #1") for chained
      multi-dimensional arrays. This patch added amended bpffs test for
      chained multi-dimensional arrays.
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9de2640b
  6. 29 Mar, 2019 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'variable-stack-access' · c3969de8
      Alexei Starovoitov authored
      Andrey Ignatov says:
      
      ====================
      The patch set adds support for stack access with variable offset from helpers.
      
      Patch 1 is the main patch in the set and provides more details.
      Patch 2 adds selftests for new functionality.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c3969de8
    • Andrey Ignatov's avatar
      selftests/bpf: Test variable offset stack access · 8ff80e96
      Andrey Ignatov authored
      Test different scenarios of indirect variable-offset stack access: out of
      bound access (>0), min_off below initialized part of the stack,
      max_off+size above initialized part of the stack, initialized stack.
      
      Example of output:
        ...
        #856/p indirect variable-offset stack access, out of bound OK
        #857/p indirect variable-offset stack access, max_off+size > max_initialized OK
        #858/p indirect variable-offset stack access, min_off < min_initialized OK
        #859/p indirect variable-offset stack access, ok OK
        ...
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8ff80e96
    • Andrey Ignatov's avatar
      bpf: Support variable offset stack access from helpers · 2011fccf
      Andrey Ignatov authored
      Currently there is a difference in how verifier checks memory access for
      helper arguments for PTR_TO_MAP_VALUE and PTR_TO_STACK with regard to
      variable part of offset.
      
      check_map_access, that is used for PTR_TO_MAP_VALUE, can handle variable
      offsets just fine, so that BPF program can call a helper like this:
      
        some_helper(map_value_ptr + off, size);
      
      , where offset is unknown at load time, but is checked by program to be
      in a safe rage (off >= 0 && off + size < map_value_size).
      
      But it's not the case for check_stack_boundary, that is used for
      PTR_TO_STACK, and same code with pointer to stack is rejected by
      verifier:
      
        some_helper(stack_value_ptr + off, size);
      
      For example:
        0: (7a) *(u64 *)(r10 -16) = 0
        1: (7a) *(u64 *)(r10 -8) = 0
        2: (61) r2 = *(u32 *)(r1 +0)
        3: (57) r2 &= 4
        4: (17) r2 -= 16
        5: (0f) r2 += r10
        6: (18) r1 = 0xffff888111343a80
        8: (85) call bpf_map_lookup_elem#1
        invalid variable stack read R2 var_off=(0xfffffffffffffff0; 0x4)
      
      Add support for variable offset access to check_stack_boundary so that
      if offset is checked by program to be in a safe range it's accepted by
      verifier.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2011fccf
  7. 28 Mar, 2019 2 commits
  8. 27 Mar, 2019 14 commits