1. 11 Nov, 2018 2 commits
    • Andrey Ignatov's avatar
      selftests/bpf: Test narrow loads with off > 0 in test_verifier · 6c2afb67
      Andrey Ignatov authored
      Test the following narrow loads in test_verifier for context __sk_buff:
      * off=1, size=1 - ok;
      * off=2, size=1 - ok;
      * off=3, size=1 - ok;
      * off=0, size=2 - ok;
      * off=1, size=2 - fail;
      * off=0, size=2 - ok;
      * off=3, size=2 - fail.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6c2afb67
    • Andrey Ignatov's avatar
      bpf: Allow narrow loads with offset > 0 · 46f53a65
      Andrey Ignatov authored
      Currently BPF verifier allows narrow loads for a context field only with
      offset zero. E.g. if there is a __u32 field then only the following
      loads are permitted:
        * off=0, size=1 (narrow);
        * off=0, size=2 (narrow);
        * off=0, size=4 (full).
      
      On the other hand LLVM can generate a load with offset different than
      zero that make sense from program logic point of view, but verifier
      doesn't accept it.
      
      E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:
      
        #define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
        ...
        	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&
      
      where ctx is struct bpf_sock_addr.
      
      Some versions of LLVM can produce the following byte code for it:
      
             8:       71 12 07 00 00 00 00 00         r2 = *(u8 *)(r1 + 7)
             9:       67 02 00 00 18 00 00 00         r2 <<= 24
            10:       18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00         r3 = 4261412864 ll
            12:       5d 32 07 00 00 00 00 00         if r2 != r3 goto +7 <LBB0_6>
      
      where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
      and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
      rejected by verifier.
      
      Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
      what means any is_valid_access implementation, that uses the function,
      works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
      sock_addr_is_valid_access() for bpf_sock_addr.
      
      The patch makes such loads supported. Offset can be in [0; size_default)
      but has to be multiple of load size. E.g. for __u32 field the following
      loads are supported now:
        * off=0, size=1 (narrow);
        * off=1, size=1 (narrow);
        * off=2, size=1 (narrow);
        * off=3, size=1 (narrow);
        * off=0, size=2 (narrow);
        * off=2, size=2 (narrow);
        * off=0, size=4 (full).
      Reported-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      46f53a65
  2. 10 Nov, 2018 18 commits
  3. 09 Nov, 2018 7 commits
    • Nitin Hande's avatar
      bpf: Extend the sk_lookup() helper to XDP hookpoint. · c8123ead
      Nitin Hande authored
      This patch proposes to extend the sk_lookup() BPF API to the
      XDP hookpoint. The sk_lookup() helper supports a lookup
      on incoming packet to find the corresponding socket that will
      receive this packet. Current support for this BPF API is
      at the tc hookpoint. This patch will extend this API at XDP
      hookpoint. A XDP program can map the incoming packet to the
      5-tuple parameter and invoke the API to find the corresponding
      socket structure.
      Signed-off-by: default avatarNitin Hande <Nitin.Hande@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c8123ead
    • David Ahern's avatar
      bpftool: Improve handling of ENOENT on map dumps · bf598a8f
      David Ahern authored
      bpftool output is not user friendly when dumping a map with only a few
      populated entries:
      
          $ bpftool map
          1: devmap  name tx_devmap  flags 0x0
                  key 4B  value 4B  max_entries 64  memlock 4096B
          2: array  name tx_idxmap  flags 0x0
                  key 4B  value 4B  max_entries 64  memlock 4096B
      
          $ bpftool map dump id 1
          key:
          00 00 00 00
          value:
          No such file or directory
          key:
          01 00 00 00
          value:
          No such file or directory
          key:
          02 00 00 00
          value:
          No such file or directory
          key: 03 00 00 00  value: 03 00 00 00
      
      Handle ENOENT by keeping the line format sane and dumping
      "<no entry>" for the value
      
          $ bpftool map dump id 1
          key: 00 00 00 00  value: <no entry>
          key: 01 00 00 00  value: <no entry>
          key: 02 00 00 00  value: <no entry>
          key: 03 00 00 00  value: 03 00 00 00
          ...
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bf598a8f
    • Sowmini Varadhan's avatar
      selftests/bpf: add a test case for sock_ops perf-event notification · 435f90a3
      Sowmini Varadhan authored
      This patch provides a tcp_bpf based eBPF sample. The test
      
      - ncat(1) as the TCP client program to connect() to a port
        with the intention of triggerring SYN retransmissions: we
        first install an iptables DROP rule to make sure ncat SYNs are
        resent (instead of aborting instantly after a TCP RST)
      
      - has a bpf kernel module that sends a perf-event notification for
        each TCP retransmit, and also tracks the number of such notifications
        sent in the global_map
      
      The test passes when the number of event notifications intercepted
      in user-space matches the value in the global_map.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      435f90a3
    • Sowmini Varadhan's avatar
      bpf: add perf event notificaton support for sock_ops · a5a3a828
      Sowmini Varadhan authored
      This patch allows eBPF programs that use sock_ops to send perf
      based event notifications using bpf_perf_event_output(). Our main
      use case for this is the following:
      
        We would like to monitor some subset of TCP sockets in user-space,
        (the monitoring application would define 4-tuples it wants to monitor)
        using TCP_INFO stats to analyze reported problems. The idea is to
        use those stats to see where the bottlenecks are likely to be ("is
        it application-limited?" or "is there evidence of BufferBloat in
        the path?" etc).
      
        Today we can do this by periodically polling for tcp_info, but this
        could be made more efficient if the kernel would asynchronously
        notify the application via tcp_info when some "interesting"
        thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
        are reached. And to make this effective, it is better if
        we could apply the threshold check *before* constructing the
        tcp_info netlink notification, so that we don't waste resources
        constructing notifications that will be discarded by the filter.
      
      This work solves the problem by adding perf event based notification
      support for sock_ops. The eBPF program can thus be designed to apply
      any desired filters to the bpf_sock_ops and trigger a perf event
      notification based on the evaluation from the filter. The user space
      component can use these perf event notifications to either read any
      state managed by the eBPF program, or issue a TCP_INFO netlink call
      if desired.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a5a3a828
    • Daniel Borkmann's avatar
      Merge branch 'bpf-max-pkt-offset' · 185067a8
      Daniel Borkmann authored
      Jiong Wang says:
      
      ====================
      The maximum packet offset accessed by one BPF program is useful
      information.
      
      Because sometimes there could be packet split and it is possible for some
      reasons (for example performance) we want to reject the BPF program if the
      maximum packet size would trigger such split. Normally, MTU value is
      treated as the maximum packet size, but one BPF program does not always
      access the whole packet, it could only access the head portion of the data.
      
      We could let verifier calculate the maximum packet offset ever used and
      record it inside prog auxiliar information structure as a new field
      "max_pkt_offset".
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      185067a8
    • Jiong Wang's avatar
      nfp: bpf: relax prog rejection through max_pkt_offset · cf599f50
      Jiong Wang authored
      NFP is refusing to offload programs whenever the MTU is set to a value
      larger than the max packet bytes that fits in NFP Cluster Target Memory
      (CTM). However, a eBPF program doesn't always need to access the whole
      packet data.
      
      Verifier has always calculated maximum direct packet access (DPA) offset,
      and kept it in max_pkt_offset inside prog auxiliar information. This patch
      relax prog rejection based on max_pkt_offset.
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cf599f50
    • Jiong Wang's avatar
      bpf: let verifier to calculate and record max_pkt_offset · e647815a
      Jiong Wang authored
      In check_packet_access, update max_pkt_offset after the offset has passed
      __check_packet_access.
      
      It should be safe to use u32 for max_pkt_offset as explained in code
      comment.
      
      Also, when there is tail call, the max_pkt_offset of the called program is
      unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such
      case.
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e647815a
  4. 07 Nov, 2018 13 commits