1. 27 Jan, 2018 2 commits
    • Daniel Borkmann's avatar
      bpf: improve dead code sanitizing · 2a5418a1
      Daniel Borkmann authored
      Given we recently had c131187d ("bpf: fix branch pruning
      logic") and 95a762e2 ("bpf: fix incorrect sign extension in
      check_alu_op()") in particular where before verifier skipped
      verification of the wrongly assumed dead branch, we should not
      just replace the dead code parts with nops (mov r0,r0). If there
      is a bug such as fixed in 95a762e2 in future again, where
      runtime could execute those insns, then one of the potential
      issues with the current setting would be that given the nops
      would be at the end of the program, we could execute out of
      bounds at some point.
      
      The best in such case would be to just exit the BPF program
      altogether and return an exception code. However, given this
      would require two instructions, and such a dead code gap could
      just be a single insn long, we would need to place 'r0 = X; ret'
      snippet at the very end after the user program or at the start
      before the program (where we'd skip that region on prog entry),
      and then place unconditional ja's into the dead code gap.
      
      While more complex but possible, there's still another block
      in the road that currently prevents from this, namely BPF to
      BPF calls. The issue here is that such exception could be
      returned from a callee, but the caller would not know that
      it's an exception that needs to be propagated further down.
      Alternative that has little complexity is to just use a ja-1
      code for now which will trap the execution here instead of
      silently doing bad things if we ever get there due to bugs.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2a5418a1
    • Daniel Borkmann's avatar
      bpf: xor of a/x in cbpf can be done in 32 bit alu · 1d621674
      Daniel Borkmann authored
      Very minor optimization; saves 1 byte per program in x86_64
      JIT in cBPF prologue.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1d621674
  2. 26 Jan, 2018 16 commits
    • Mickaël Salaün's avatar
      samples/bpf: Partially fixes the bpf.o build · c25ef6a5
      Mickaël Salaün authored
      Do not build lib/bpf/bpf.o with this Makefile but use the one from the
      library directory.  This avoid making a buggy bpf.o file (e.g. missing
      symbols).
      
      This patch is useful if some code (e.g. Landlock tests) needs both the
      bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).
      Signed-off-by: default avatarMickaël Salaün <mic@digikod.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c25ef6a5
    • Lawrence Brakmo's avatar
      bpf: clean up from test_tcpbpf_kern.c · 771fc607
      Lawrence Brakmo authored
      Removed commented lines from test_tcpbpf_kern.c
      
      Fixes: d6d4f60c bpf: add selftest for tcpbpf
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      771fc607
    • Mickaël Salaün's avatar
      bpf: Use the IS_FD_ARRAY() macro in map_update_elem() · 9c147b56
      Mickaël Salaün authored
      Make the code more readable.
      Signed-off-by: default avatarMickaël Salaün <mic@digikod.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9c147b56
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-more-sock_ops-callbacks' · 82f1e0f3
      Alexei Starovoitov authored
      Lawrence Brakmo says:
      
      ====================
      This patchset adds support for:
      
      - direct R or R/W access to many tcp_sock fields
      - passing up to 4 arguments to sock_ops BPF functions
      - tcp_sock field bpf_sock_ops_cb_flags for controlling callbacks
      - optionally calling sock_ops BPF program when RTO fires
      - optionally calling sock_ops BPF program when packet is retransmitted
      - optionally calling sock_ops BPF program when TCP state changes
      - access to tclass and sk_txhash
      - new selftest
      
      v2: Fixed commit message 0/11. The commit is to "bpf-next" but the patch
          below used "bpf" and Patchwork didn't work correctly.
      v3: Cleaned RTO callback as per  Yuchung's comment
          Added BPF enum for TCP states as per  Alexei's comment
      v4: Fixed compile warnings related to detecting changes between TCP
          internal states and the BPF defined states.
      v5: Fixed comment issues in some selftest files
          Fixed accesss issue with u64 fields in bpf_sock_ops struct
      v6: Made fixes based on comments form Eric Dumazet:
          The field bpf_sock_ops_cb_flags was addded in a hole on 64bit kernels
          Field bpf_sock_ops_cb_flags is now set through a helper function
          which returns an error when a BPF program tries to set bits for
          callbacks that are not supported in the current kernel.
          Added a comment indicating that when adding fields to bpf_sock_ops_kern
          they should be added before the field named "temp" if they need to be
          cleared before calling the BPF function.
      v7: Enfornced fields "op" and "replylong[1] .. replylong[3]" not be writable
          based on comments form Eric Dumazet and Alexei Starovoitov.
          Filled 32 bit hole in bpf_sock_ops struct with sk_txhash based on
          comments from Daniel Borkmann.
          Removed unused functions (tcp_call_bpf_1arg, tcp_call_bpf_4arg) based
          on comments from Daniel Borkmann.
      v8: Add commit message 00/12
          Add Acked-by as appropriate
      v9: Moved the bug fix to the front of the patchset
          Changed RETRANS_CB so it is always called (before it was only called if
          the retransmit succeeded). It is now called with an extra argument, the
          return value of tcp_transmit_skb (0 => success). Based on comments
          from Yuchung Cheng.
          Added support for reading 2 new fields, sacked_out and lost_out, based on
          comments from Yuchung Cheng.
      v10: Moved the callback flags from include/uapi/linux/tcp.h to
           include/uapi/linux/bpf.h
           Cleaned up the test in selftest. Added a timeout so it always completes,
           even if the client is not communicating with the server. Made it faster
           by removing the sleeps. Made sure it works even when called back-to-back
           20 times.
      
      Consists of the following patches:
      [PATCH bpf-next v10 01/12] bpf: Only reply field should be writeable
      [PATCH bpf-next v10 02/12] bpf: Make SOCK_OPS_GET_TCP size
      [PATCH bpf-next v10 03/12] bpf: Make SOCK_OPS_GET_TCP struct
      [PATCH bpf-next v10 04/12] bpf: Add write access to tcp_sock and sock
      [PATCH bpf-next v10 05/12] bpf: Support passing args to sock_ops bpf
      [PATCH bpf-next v10 06/12] bpf: Adds field bpf_sock_ops_cb_flags to
      [PATCH bpf-next v10 07/12] bpf: Add sock_ops RTO callback
      [PATCH bpf-next v10 08/12] bpf: Add support for reading sk_state and
      [PATCH bpf-next v10 09/12] bpf: Add sock_ops R/W access to tclass
      [PATCH bpf-next v10 10/12] bpf: Add BPF_SOCK_OPS_RETRANS_CB
      [PATCH bpf-next v10 11/12] bpf: Add BPF_SOCK_OPS_STATE_CB
      [PATCH bpf-next v10 12/12] bpf: add selftest for tcpbpf
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      82f1e0f3
    • Lawrence Brakmo's avatar
      bpf: add selftest for tcpbpf · d6d4f60c
      Lawrence Brakmo authored
      Added a selftest for tcpbpf (sock_ops) that checks that the appropriate
      callbacks occured and that it can access tcp_sock fields and that their
      values are correct.
      
      Run with command: ./test_tcpbpf_user
      Adding the flag "-d" will show why it did not pass.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d6d4f60c
    • Lawrence Brakmo's avatar
      bpf: Add BPF_SOCK_OPS_STATE_CB · d4487491
      Lawrence Brakmo authored
      Adds support for calling sock_ops BPF program when there is a TCP state
      change. Two arguments are used; one for the old state and another for
      the new state.
      
      There is a new enum in include/uapi/linux/bpf.h that exports the TCP
      states that prepends BPF_ to the current TCP state names. If it is ever
      necessary to change the internal TCP state values (other than adding
      more to the end), then it will become necessary to convert from the
      internal TCP state value to the BPF value before calling the BPF
      sock_ops function. There are a set of compile checks added in tcp.c
      to detect if the internal and BPF values differ so we can make the
      necessary fixes.
      
      New op: BPF_SOCK_OPS_STATE_CB.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d4487491
    • Lawrence Brakmo's avatar
      bpf: Add BPF_SOCK_OPS_RETRANS_CB · a31ad29e
      Lawrence Brakmo authored
      Adds support for calling sock_ops BPF program when there is a
      retransmission. Three arguments are used; one for the sequence number,
      another for the number of segments retransmitted, and the last one for
      the return value of tcp_transmit_skb (0 => success).
      Does not include syn-ack retransmissions.
      
      New op: BPF_SOCK_OPS_RETRANS_CB.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a31ad29e
    • Lawrence Brakmo's avatar
      bpf: Add sock_ops R/W access to tclass · 6f9bd3d7
      Lawrence Brakmo authored
      Adds direct write access to sk_txhash and access to tclass for ipv6
      flows through getsockopt and setsockopt. Sample usage for tclass:
      
        bpf_getsockopt(skops, SOL_IPV6, IPV6_TCLASS, &v, sizeof(v))
      
      where skops is a pointer to the ctx (struct bpf_sock_ops).
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6f9bd3d7
    • Lawrence Brakmo's avatar
      bpf: Add support for reading sk_state and more · 44f0e430
      Lawrence Brakmo authored
      Add support for reading many more tcp_sock fields
      
        state,	same as sk->sk_state
        rtt_min	same as sk->rtt_min.s[0].v (current rtt_min)
        snd_ssthresh
        rcv_nxt
        snd_nxt
        snd_una
        mss_cache
        ecn_flags
        rate_delivered
        rate_interval_us
        packets_out
        retrans_out
        total_retrans
        segs_in
        data_segs_in
        segs_out
        data_segs_out
        lost_out
        sacked_out
        sk_txhash
        bytes_received (__u64)
        bytes_acked    (__u64)
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      44f0e430
    • Lawrence Brakmo's avatar
      bpf: Add sock_ops RTO callback · f89013f6
      Lawrence Brakmo authored
      Adds an optional call to sock_ops BPF program based on whether the
      BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
      The BPF program is passed 2 arguments: icsk_retransmits and whether the
      RTO has expired.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f89013f6
    • Lawrence Brakmo's avatar
      bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock · b13d8807
      Lawrence Brakmo authored
      Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
      use is to determine if there should be calls to sock_ops bpf program at
      various points in the TCP code. The field is initialized to zero,
      disabling the calls. A sock_ops BPF program can set it, per connection and
      as necessary, when the connection is established.
      
      It also adds support for reading and writting the field within a
      sock_ops BPF program. Reading is done by accessing the field directly.
      However, writing is done through the helper function
      bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
      is trying to set a callback that is not supported in the current kernel
      (i.e. running an older kernel). The helper function returns 0 if it was
      able to set all of the bits set in the argument, a positive number
      containing the bits that could not be set, or -EINVAL if the socket is
      not a full TCP socket.
      
      Examples of where one could call the bpf program:
      
      1) When RTO fires
      2) When a packet is retransmitted
      3) When the connection terminates
      4) When a packet is sent
      5) When a packet is received
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b13d8807
    • Lawrence Brakmo's avatar
      bpf: Support passing args to sock_ops bpf function · de525be2
      Lawrence Brakmo authored
      Adds support for passing up to 4 arguments to sock_ops bpf functions. It
      reusues the reply union, so the bpf_sock_ops structures are not
      increased in size.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      de525be2
    • Lawrence Brakmo's avatar
      bpf: Add write access to tcp_sock and sock fields · b73042b8
      Lawrence Brakmo authored
      This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
      struct tcp_sock or struct sock fields. This required adding a new
      field "temp" to struct bpf_sock_ops_kern for temporary storage that
      is used by sock_ops_convert_ctx_access. It is used to store and recover
      the contents of a register, so the register can be used to store the
      address of the sk. Since we cannot overwrite the dst_reg because it
      contains the pointer to ctx, nor the src_reg since it contains the value
      we want to store, we need an extra register to contain the address
      of the sk.
      
      Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
      GET or SET macros depending on the value of the TYPE field.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b73042b8
    • Lawrence Brakmo's avatar
      bpf: Make SOCK_OPS_GET_TCP struct independent · 34d367c5
      Lawrence Brakmo authored
      Changed SOCK_OPS_GET_TCP to SOCK_OPS_GET_FIELD and added 2
      arguments so now it can also work with struct sock fields.
      The first argument is the name of the field in the bpf_sock_ops
      struct, the 2nd argument is the name of the field in the OBJ struct.
      
      Previous: SOCK_OPS_GET_TCP(FIELD_NAME)
      New:      SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)
      
      Where OBJ is either "struct tcp_sock" or "struct sock" (without
      quotation). BPF_FIELD is the name of the field in the bpf_sock_ops
      struct and OBJ_FIELD is the name of the field in the OBJ struct.
      
      Although the field names are currently the same, the kernel struct names
      could change in the future and this change makes it easier to support
      that.
      
      Note that adding access to tcp_sock fields in sock_ops programs does
      not preclude the tcp_sock fields from being removed as long as we are
      willing to do one of the following:
      
        1) Return a fixed value (e.x. 0 or 0xffffffff), or
        2) Make the verifier fail if that field is accessed (i.e. program
          fails to load) so the user will know that field is no longer
          supported.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      34d367c5
    • Lawrence Brakmo's avatar
      bpf: Make SOCK_OPS_GET_TCP size independent · a33de397
      Lawrence Brakmo authored
      Make SOCK_OPS_GET_TCP helper macro size independent (before only worked
      with 4-byte fields.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a33de397
    • Lawrence Brakmo's avatar
      bpf: Only reply field should be writeable · 2585cd62
      Lawrence Brakmo authored
      Currently, a sock_ops BPF program can write the op field and all the
      reply fields (reply and replylong). This is a bug. The op field should
      not have been writeable and there is currently no way to use replylong
      field for indices >= 1. This patch enforces that only the reply field
      (which equals replylong[0]) is writeable.
      
      Fixes: 40304b2a ("bpf: BPF support for sock_ops")
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2585cd62
  3. 24 Jan, 2018 10 commits
  4. 23 Jan, 2018 9 commits
  5. 22 Jan, 2018 3 commits