1. 12 Jul, 2018 4 commits
  2. 11 Jul, 2018 14 commits
  3. 06 Jul, 2018 13 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-nfp-mul-div-support' · d90c936f
      Daniel Borkmann authored
      Jiong Wang says:
      
      ====================
      NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
      step, therefore we need 2 steps for u16 and 4 steps for u32.
      
      We also need one start instruction to initialize the sequence and one or
      two instructions to fetch the result depending on either you need the high
      halve of u32 multiplication.
      
      For ALU64, if either operand is beyond u32's value range, we reject it. One
      thing to note, if the source operand is BPF_K, then we need to check "imm"
      field directly, and we'd reject it if it is negative.  Because for ALU64,
      "imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
      doesn't support. For ALU32, it is fine for "imm" be negative though,
      because the result is 32-bits and here is no difference on the low halve
      of result for signed/unsigned mul, so we will get correct result.
      
      NFP doesn't have integer divide instruction, this patch set uses reciprocal
      algorithm (the basic one, reciprocal_div) to emulate it.
      
      For each u32 divide, we would need 11 instructions to finish the operation.
      
         7 (for multiplication) + 4 (various ALUs) = 11
      
      Given NFP only supports multiplication no bigger than u32, we'd require
      divisor and dividend no bigger than that as well.
      
      Also eBPF doesn't support signed divide and has enforced this on C language
      level by failing compilation. However LLVM assembler hasn't enforced this,
      so it is possible for negative constant to leak in as a BPF_K operand
      through assembly code, we reject such cases as well.
      
      Meanwhile reciprocal_div.h only implemented the basic version of:
      
         "Division by Invariant Integers Using Multiplication"
                                - Torbjörn Granlund and Peter L. Montgomery
      
      This patch set further implements the optimized version (Figure 4.2 in the
      paper) inside existing reciprocal_div.h. When the divider is even and the
      calculated reciprocal magic number doesn't fit u32, we could reduce the
      required ALU instructions from 4 to 2 or 1 for some cases.
      
      The advanced version requires more complex calculation to get the
      reciprocal multiplier and other control variables, but then could reduce
      the required emulation operations. It makes sense to use it for JIT divide
      code generation (for example eBPF JIT backends) for which we are willing to
      trade performance of JITed code with that of host.
      
      v2:
        - add warning in l == 32 code path. (Song Liu/Jakub)
        - jit separate insn sequence for l == 32. (Jakub/Edwin)
        - should use unrestricted operand for mul.
        - once place should use "1U << exp" instead of "1 << exp".
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d90c936f
    • Jiong Wang's avatar
      nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h · 9fb410a8
      Jiong Wang authored
      As we are doing JIT, we would want to use the advanced version of the
      reciprocal divide (reciprocal_value_adv) to trade performance with host.
      
      We could reduce the required ALU instructions from 4 to 2 or 1.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9fb410a8
    • Jiong Wang's avatar
      nfp: bpf: support u32 divide using reciprocal_div.h · 2a952b03
      Jiong Wang authored
      NFP doesn't have integer divide instruction, this patch use reciprocal
      algorithm (the basic one, reciprocal_div) to emulate it.
      
      For each u32 divide, we would need 11 instructions to finish the operation.
      
        7 (for multiplication) + 4 (various ALUs) = 11
      
      Given NFP only supports multiplication no bigger than u32, we'd require
      divisor and dividend no bigger than that as well.
      
      Also eBPF doesn't support signed divide and has enforced this on C language
      level by failing compilation. However LLVM assembler hasn't enforced this,
      so it is possible for negative constant to leak in as a BPF_K operand
      through assembly code, we reject such cases as well.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2a952b03
    • Jiong Wang's avatar
      nfp: bpf: support u16 and u32 multiplications · d3d23fdb
      Jiong Wang authored
      NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
      step, therefore we need 2 steps for u16 and 4 steps for u32.
      
      We also need one start instruction to initialize the sequence and one or
      two instructions to fetch the result depending on either you need the high
      halve of u32 multiplication.
      
      For ALU64, if either operand is beyond u32's value range, we reject it. One
      thing to note, if the source operand is BPF_K, then we need to check "imm"
      field directly, and we'd reject it if it is negative.  Because for ALU64,
      "imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
      doesn't support. For ALU32, it is fine for "imm" be negative though,
      because the result is 32-bits and here is no difference on the low halve
      of result for signed/unsigned mul, so we will get correct result.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d3d23fdb
    • Jiong Wang's avatar
      nfp: bpf: copy range info for all operands of all ALU operations · 33b94310
      Jiong Wang authored
      NFP verifier hook is coping range information of the shift amount for
      indirect shift operation so optimized shift sequences could be generated.
      
      We want to use range info to do more things. For example, to decide whether
      multiplication and divide are supported on the given range.
      
      This patch simply let NFP verifier hook to copy range info for all operands
      of all ALU operands.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      33b94310
    • Jiong Wang's avatar
      nfp: bpf: rename umin/umax to umin_src/umax_src · 662c5472
      Jiong Wang authored
      The two fields are a copy of umin and umax info of bpf_insn->src_reg
      generated by verifier.
      
      Rename to make their meaning clear.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      662c5472
    • Jiong Wang's avatar
      lib: reciprocal_div: implement the improved algorithm on the paper mentioned · 06ae4826
      Jiong Wang authored
      The new added "reciprocal_value_adv" implements the advanced version of the
      algorithm described in Figure 4.2 of the paper except when
      "divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
      requires u128 divide on host. The exception case could be easily handled
      before calling "reciprocal_value_adv".
      
      The advanced version requires more complex calculation to get the
      reciprocal multiplier and other control variables, but then could reduce
      the required emulation operations.
      
      It makes no sense to use this advanced version for host divide emulation,
      those extra complexities for calculating multiplier etc could completely
      waive our saving on emulation operations.
      
      However, it makes sense to use it for JIT divide code generation (for
      example eBPF JIT backends) for which we are willing to trade performance of
      JITed code with that of host. As shown by the following pseudo code, the
      required emulation operations could go down from 6 (the basic version) to 3
      or 4.
      
      To use the result of "reciprocal_value_adv", suppose we want to calculate
      n/d, the C-style pseudo code will be the following, it could be easily
      changed to real code generation for other JIT targets.
      
        struct reciprocal_value_adv rvalue;
        u8 pre_shift, exp;
      
        // handle exception case.
        if (d >= (1U << 31)) {
          result = n >= d;
          return;
        }
        rvalue = reciprocal_value_adv(d, 32)
        exp = rvalue.exp;
        if (rvalue.is_wide_m && !(d & 1)) {
          // floor(log2(d & (2^32 -d)))
          pre_shift = fls(d & -d) - 1;
          rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
        } else {
          pre_shift = 0;
        }
      
        // code generation starts.
        if (imm == 1U << exp) {
          result = n >> exp;
        } else if (rvalue.is_wide_m) {
          // pre_shift must be zero when reached here.
          t = (n * rvalue.m) >> 32;
          result = n - t;
          result >>= 1;
          result += t;
          result >>= rvalue.sh - 1;
        } else {
          if (pre_shift)
            result = n >> pre_shift;
          result = ((u64)result * rvalue.m) >> 32;
          result >>= rvalue.sh;
        }
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      06ae4826
    • Roman Gushchin's avatar
      bpftool: add bash completion for cgroup tree command · 02000b55
      Roman Gushchin authored
      This commit adds a bash completion to the bpftool cgroup tree
      command.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      02000b55
    • Roman Gushchin's avatar
      bpftool: document cgroup tree command · 7d31a0a1
      Roman Gushchin authored
      Describe cgroup tree command in the corresponding bpftool man page.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7d31a0a1
    • Roman Gushchin's avatar
      bpftool: introduce cgroup tree command · 2058b383
      Roman Gushchin authored
      This commit introduces a new bpftool command: cgroup tree.
      The idea is to iterate over the whole cgroup tree and print
      all attached programs.
      
      I was debugging a bpf/systemd issue, and found, that there is
      no simple way to listen all bpf programs attached to cgroups.
      I did master something in bash, but after some time got tired of it,
      and decided, that adding a dedicated bpftool command could be
      a better idea.
      
      So, here it is:
        $ sudo ./bpftool cgroup tree
        CgroupPath
        ID       AttachType      AttachFlags     Name
        /sys/fs/cgroup/system.slice/systemd-machined.service
            18       ingress
            17       egress
        /sys/fs/cgroup/system.slice/systemd-logind.service
            20       ingress
            19       egress
        /sys/fs/cgroup/system.slice/systemd-udevd.service
            16       ingress
            15       egress
        /sys/fs/cgroup/system.slice/systemd-journald.service
            14       ingress
            13       egress
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2058b383
    • David S. Miller's avatar
      Merge branch 'IP-listification-follow-ups' · ab8565af
      David S. Miller authored
      Edward Cree says:
      
      ====================
      IP listification follow-ups
      
      While working on IPv6 list processing, I found another bug in the IPv4
       version.  So this patch series has that fix, and the IPv6 version with
       both fixes incorporated.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab8565af
    • Edward Cree's avatar
      net: ipv6: listify ipv6_rcv() and ip6_rcv_finish() · d8269e2c
      Edward Cree authored
      Essentially the same as the ipv4 equivalents.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8269e2c
    • Edward Cree's avatar
      net: ipv4: fix list processing on L3 slave devices · efe6aaca
      Edward Cree authored
      If we have an L3 master device, l3mdev_ip_rcv() will steal the skb, but
       we were returning NET_RX_SUCCESS from ip_rcv_finish_core() which meant
       that ip_list_rcv_finish() would keep it on the list.  Instead let's
       move the l3mdev_ip_rcv() call into the caller, so that our response to
       a steal can be different in the single packet path (return
       NET_RX_SUCCESS) and the list path (forget this packet and continue).
      
      Fixes: 5fa12739 ("net: ipv4: listify ip_rcv_finish")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe6aaca
  4. 05 Jul, 2018 9 commits