1. 17 Feb, 2019 2 commits
  2. 16 Feb, 2019 1 commit
  3. 15 Feb, 2019 3 commits
    • Andrey Ignatov's avatar
      libbpf: Introduce bpf_object__btf · 789f6bab
      Andrey Ignatov authored
      Add new accessor for bpf_object to get opaque struct btf * from it.
      
      struct btf * is needed for all operations with BTF and it's present in
      bpf_object. The only thing missing is a way to get it.
      
      Example use-case is to get BTF key_type_id and value_type_id for a map in
      bpf_object. It can be done with btf__get_map_kv_tids() but that function
      requires struct btf *.
      
      Similar API can be added for struct btf_ext but no use-case for it yet.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      789f6bab
    • Andrey Ignatov's avatar
      libbpf: Introduce bpf_map__resize · 1a11a4c7
      Andrey Ignatov authored
      Add bpf_map__resize() to change max_entries for a map.
      
      Quite often necessary map size is unknown at compile time and can be
      calculated only at run time.
      
      Currently the following approach is used to do so:
      * bpf_object__open_buffer() to open Elf file from a buffer;
      * bpf_object__find_map_by_name() to find relevant map;
      * bpf_map__def() to get map attributes and create struct
        bpf_create_map_attr from them;
      * update max_entries in bpf_create_map_attr;
      * bpf_create_map_xattr() to create new map with updated max_entries;
      * bpf_map__reuse_fd() to replace the map in bpf_object with newly
        created one.
      
      And after all this bpf_object can finally be loaded. The map will have
      new size.
      
      It 1) is quite a lot of steps; 2) doesn't take BTF into account.
      
      For "2)" even more steps should be made and some of them require changes
      to libbpf (e.g. to get struct btf * from bpf_object).
      
      Instead the whole problem can be solved by introducing simple
      bpf_map__resize() API that checks the map and sets new max_entries if
      the map is not loaded yet.
      
      So the new steps are:
      * bpf_object__open_buffer() to open Elf file from a buffer;
      * bpf_object__find_map_by_name() to find relevant map;
      * bpf_map__resize() to update max_entries.
      
      That's much simpler and works with BTF.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1a11a4c7
    • Jan Sokolowski's avatar
      net: bpf: remove XDP_QUERY_XSK_UMEM enumerator · f8ebfaf6
      Jan Sokolowski authored
      Commit c9b47cc1 ("xsk: fix bug when trying to use both copy and
      zero-copy on one queue id") moved the umem query code to the AF_XDP
      core, and therefore removed the need to query the netdevice for a
      umem.
      
      This patch removes XDP_QUERY_XSK_UMEM and all code that implement that
      behavior, which is just dead code.
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f8ebfaf6
  4. 14 Feb, 2019 12 commits
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf-cleanup' · 9875964b
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      This patchset replaces bzero() with memset() and syncs if_link.h header
      to suppress unsynchronized headers warning.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9875964b
    • Andrii Nakryiko's avatar
      tools: sync uapi/linux/if_link.h header · d9312064
      Andrii Nakryiko authored
      Syncing if_link.h that got out of sync.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d9312064
    • Andrii Nakryiko's avatar
      tools/bpf: replace bzero with memset · 1ad9cbb8
      Andrii Nakryiko authored
      bzero() call is deprecated and superseded by memset().
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Reported-by: default avatarDavid Laight <david.laight@aculab.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1ad9cbb8
    • Peter Oskolkov's avatar
      bpf: fix memory leak in bpf_lwt_xmit_reroute · fb405883
      Peter Oskolkov authored
      On error the skb should be freed. Tested with diff/steps
      provided by David Ahern.
      
      v2: surface routing errors to the user instead of a generic EINVAL,
          as suggested by David Ahern.
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Fixes: 3bd0b152 ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fb405883
    • Alexei Starovoitov's avatar
      Merge branch 'lwt_encap_ip' · 87486b23
      Alexei Starovoitov authored
      Peter Oskolkov says:
      
      ====================
      This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
      BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
      and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
      to packets (e.g. IP/GRE, GUE, IPIP).
      
      This is useful when thousands of different short-lived flows should be
      encapped, each with different and dynamically determined destination.
      Although lwtunnels can be used in some of these scenarios, the ability
      to dynamically generate encap headers adds more flexibility, e.g.
      when routing depends on the state of the host (reflected in global bpf
      maps).
      
      V2 changes: added flowi-based route lookup, IPv6 encapping, and
         encapping on ingress.
      
      V3 changes: incorporated David Ahern's suggestions:
         - added l3mdev check/oif (patch 2)
         - sync bpf.h from include/uapi into tools/include/uapi
         - selftest tweaks
      
      V4 changes: moved route lookup/dst change from bpf_push_ip_encap
         to when BPF_LWT_REROUTE is handled, as suggested by David Ahern.
      
      V5 changes: added a check in lwt_xmit that skb->protocol stays the
         same if the skb is to be passed back to the stack (ret == BPF_OK).
         Again, suggested by David Ahern.
      
      V6 changes: abandoned.
      
      V7 changes: added handling of GSO packets (patch 3 in the patchset added),
         as suggested by BPF maintainers.
      
      V8 changes:
         - fixed build errors when LWT or IPV6 are not enabled;
         - whitelisted TCP GSO instead of blacklisting SCTP and UDP GSO, as
           suggested by Willem de Bruijn;
         - added validation that pushed length cover needed headers when GRE/UDP
           encap is detected, as suggested by Willem de Bruijn;
         - a couple of minor/stylistic tweaks/fixed typos.
      
      V9 changes:
         - fixed a kbuild test robot compiler warning;
         - added ipv6_route_input to ipv6_stub (patch 4 in the patchset
           added), and IPv6 routing functions are now invoked via ipv6_stub,
           as suggested by David Ahern.
      
      V10 changes:
         - removed unnecessary IS_ENABLED and pr_warn_once from patch 5.
      
      V11 changes: fixed a potential dst leak in patch 5, as suggested by
          David Ahern.
      ====================
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      87486b23
    • Peter Oskolkov's avatar
      selftests: bpf: add test_lwt_ip_encap selftest · 0fde56e4
      Peter Oskolkov authored
      This patch adds a bpf self-test to cover BPF_LWT_ENCAP_IP mode
      in bpf_lwt_push_encap.
      
      Covered:
      - encapping in LWT_IN and LWT_XMIT
      - IPv4 and IPv6
      
      A follow-up patch will add GSO and VRF-enabled tests.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0fde56e4
    • Peter Oskolkov's avatar
      bpf: sync <kdir>/include/.../bpf.h with tools/include/.../bpf.h · 755db477
      Peter Oskolkov authored
      This patch copies changes in bpf.h done by a previous patch
      in this patchset from the kernel uapi include dir into tools
      uapi include dir.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      755db477
    • Peter Oskolkov's avatar
      bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c · 3bd0b152
      Peter Oskolkov authored
      This patch builds on top of the previous patch in the patchset,
      which added BPF_LWT_ENCAP_IP mode to bpf_lwt_push_encap. As the
      encapping can result in the skb needing to go via a different
      interface/route/dst, bpf programs can indicate this by returning
      BPF_LWT_REROUTE, which triggers a new route lookup for the skb.
      
      v8 changes: fix kbuild errors when LWTUNNEL_BPF is builtin, but
         IPV6 is a module: as LWTUNNEL_BPF can only be either Y or N,
         call IPV6 routing functions only if they are built-in.
      
      v9 changes:
         - fixed a kbuild test robot compiler warning;
         - call IPV6 routing functions via ipv6_stub.
      
      v10 changes: removed unnecessary IS_ENABLED and pr_warn_once.
      
      v11 changes: fixed a potential dst leak.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3bd0b152
    • Peter Oskolkov's avatar
      ipv6_stub: add ipv6_route_input stub/proxy. · 9b0a6a9d
      Peter Oskolkov authored
      Proxy ip6_route_input via ipv6_stub, for later use by lwt bpf ip encap
      (see the next patch in the patchset).
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9b0a6a9d
    • Peter Oskolkov's avatar
      bpf: handle GSO in bpf_lwt_push_encap · ca78801a
      Peter Oskolkov authored
      This patch adds handling of GSO packets in bpf_lwt_push_ip_encap()
      (called from bpf_lwt_push_encap):
      
      * IPIP, GRE, and UDP encapsulation types are deduced by looking
        into iphdr->protocol or ipv6hdr->next_header;
      * SCTP GSO packets are not supported (as bpf_skb_proto_4_to_6
        and similar do);
      * UDP_L4 GSO packets are also not supported (although they are
        not blocked in bpf_skb_proto_4_to_6 and similar), as
        skb_decrease_gso_size() will break it;
      * SKB_GSO_DODGY bit is set.
      
      Note: it may be possible to support SCTP and UDP_L4 gso packets;
            but as these cases seem to be not well handled by other
            tunneling/encapping code paths, the solution should
            be generic enough to apply to all tunneling/encapping code.
      
      v8 changes:
         - make sure that if GRE or UDP encap is detected, there is
           enough of pushed bytes to cover both IP[v6] + GRE|UDP headers;
         - do not reject double-encapped packets;
         - whitelist TCP GSO packets rather than block SCTP GSO and
           UDP GSO.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ca78801a
    • Peter Oskolkov's avatar
      bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap · 52f27877
      Peter Oskolkov authored
      Implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper.
      It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and
      BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
      to packets (e.g. IP/GRE, GUE, IPIP).
      
      This is useful when thousands of different short-lived flows should be
      encapped, each with different and dynamically determined destination.
      Although lwtunnels can be used in some of these scenarios, the ability
      to dynamically generate encap headers adds more flexibility, e.g.
      when routing depends on the state of the host (reflected in global bpf
      maps).
      
      v7 changes:
       - added a call skb_clear_hash();
       - removed calls to skb_set_transport_header();
       - refuse to encap GSO-enabled packets.
      
      v8 changes:
       - fix build errors when LWT is not enabled.
      
      Note: the next patch in the patchset with deal with GSO-enabled packets,
      which are currently rejected at encapping attempt.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      52f27877
    • Peter Oskolkov's avatar
      bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap · 3e0bd37c
      Peter Oskolkov authored
      This patch adds all needed plumbing in preparation to allowing
      bpf programs to do IP encapping via bpf_lwt_push_encap. Actual
      implementation is added in the next patch in the patchset.
      
      Of note:
      - bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT
        prog types in addition to BPF_PROG_TYPE_LWT_IN;
      - if the skb being encapped has GSO set, encapsulation is limited
        to IPIP/IP+GRE/IP+GUE (both IPv4 and IPv6);
      - as route lookups are different for ingress vs egress, the single
        external bpf_lwt_push_encap BPF helper is routed internally to
        either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs,
        depending on prog type.
      
      v8 changes: fixed a typo.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3e0bd37c
  5. 12 Feb, 2019 7 commits
  6. 11 Feb, 2019 9 commits
    • Alexei Starovoitov's avatar
      Merge branch 'skb_sk-sk_fullsock-tcp_sock' · d105fa98
      Alexei Starovoitov authored
      Martin KaFai Lau says:
      
      ====================
      This series adds __sk_buff->sk, "struct bpf_tcp_sock",
      BPF_FUNC_sk_fullsock and BPF_FUNC_tcp_sock.  Together, they provide
      a common way to expose the members of "struct tcp_sock" and
      "struct bpf_sock" for the bpf_prog to access.
      
      The patch series first adds a bpf_sock pointer to __sk_buff
      and a new helper BPF_FUNC_sk_fullsock.
      
      It then adds BPF_FUNC_tcp_sock to get a bpf_tcp_sock
      pointer from a bpf_sock pointer.
      
      The current use case is to allow a cg_skb_bpf_prog to provide
      per cgroup traffic policing/shaping.
      
      Please see individual patch for details.
      
      v2:
      - Patch 1 depends on
        commit d6238766 ("bpf: Fix narrow load on a bpf_sock returned from sk_lookup()")
        in the bpf branch.
      - Add sk_to_full_sk() to bpf_sk_fullsock() and bpf_tcp_sock()
        such that there is a way to access the listener's sk and tcp_sk
        when __sk_buff->sk is a request_sock.
        The comments in the uapi bpf.h is updated accordingly.
      - bpf_ctx_range_till() is used in bpf_sock_common_is_valid_access()
        in patch 1.  Saved a few lines.
      - Patch 2 is new in v2 and it adds "state", "dst_ip4", "dst_ip6" and
        "dst_port" to the bpf_sock.  Narrow load is allowed on them.
        The "state" (i.e. sk_state) has already been used in
        INET_DIAG (e.g. ss -t) and getsockopt(TCP_INFO).
      - While at it in the new patch 2, also allow narrow load on some
        existing fields of the bpf_sock, which are "family", "type", "protocol"
        and "src_port".  Only allow loading from first byte for now.
        i.e. does not allow narrow load starting from the 2nd byte.
      - Add some narrow load tests to the test_verifier's sock.c
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d105fa98
    • Martin KaFai Lau's avatar
      bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock · e0b27b3f
      Martin KaFai Lau authored
      This patch adds a C program to show the usage on
      skb->sk and bpf_tcp_sock.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e0b27b3f
    • Martin KaFai Lau's avatar
      bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to test_verifer · fb47d1d9
      Martin KaFai Lau authored
      This patch tests accessing the skb->sk and the new helpers,
      bpf_sk_fullsock and bpf_tcp_sock.
      
      The errstr of some existing "reference tracking" tests is changed
      with s/bpf_sock/sock/ and s/socket/sock/ where "sock" is from the
      verifier's reg_type_str[].
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fb47d1d9
    • Martin KaFai Lau's avatar
      bpf: Sync bpf.h to tools/ · 281f9e75
      Martin KaFai Lau authored
      This patch sync the uapi bpf.h to tools/.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      281f9e75
    • Martin KaFai Lau's avatar
      bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock · 655a51e5
      Martin KaFai Lau authored
      This patch adds a helper function BPF_FUNC_tcp_sock and it
      is currently available for cg_skb and sched_(cls|act):
      
      struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);
      
      int cg_skb_foo(struct __sk_buff *skb) {
      	struct bpf_tcp_sock *tp;
      	struct bpf_sock *sk;
      	__u32 snd_cwnd;
      
      	sk = skb->sk;
      	if (!sk)
      		return 1;
      
      	tp = bpf_tcp_sock(sk);
      	if (!tp)
      		return 1;
      
      	snd_cwnd = tp->snd_cwnd;
      	/* ... */
      
      	return 1;
      }
      
      A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
      read-only access.  bpf_tcp_sock has all the existing tcp_sock's fields
      that has already been exposed by the bpf_sock_ops.
      i.e. no new tcp_sock's fields are exposed in bpf.h.
      
      This helper returns a pointer to the tcp_sock.  If it is not a tcp_sock
      or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
      returns NULL.  Hence, the caller needs to check for NULL before
      accessing it.
      
      The current use case is to expose members from tcp_sock
      to allow a cg_skb_bpf_prog to provide per cgroup traffic
      policing/shaping.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      655a51e5
    • Martin KaFai Lau's avatar
      bpf: Refactor sock_ops_convert_ctx_access · 9b1f3d6e
      Martin KaFai Lau authored
      The next patch will introduce a new "struct bpf_tcp_sock" which
      exposes the same tcp_sock's fields already exposed in
      "struct bpf_sock_ops".
      
      This patch refactor the existing convert_ctx_access() codes for
      "struct bpf_sock_ops" to get them ready to be reused for
      "struct bpf_tcp_sock".  The "rtt_min" is not refactored
      in this patch because its handling is different from other
      fields.
      
      The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD
      changes are code move only.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9b1f3d6e
    • Martin KaFai Lau's avatar
      bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock · aa65d696
      Martin KaFai Lau authored
      This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the
      bpf_sock.  The userspace has already been using "state",
      e.g. inet_diag (ss -t) and getsockopt(TCP_INFO).
      
      This patch also allows narrow load on the following existing fields:
      "family", "type", "protocol" and "src_port".  Unlike IP address,
      the load offset is resticted to the first byte for them but it
      can be relaxed later if there is a use case.
      
      This patch also folds __sock_filter_check_size() into
      bpf_sock_is_valid_access() since it is not called
      by any where else.  All bpf_sock checking is in
      one place.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      aa65d696
    • Martin KaFai Lau's avatar
      bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper · 46f8bc92
      Martin KaFai Lau authored
      In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
      before accessing the fields in sock.  For example, in __netdev_pick_tx:
      
      static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
      			    struct net_device *sb_dev)
      {
      	/* ... */
      
      	struct sock *sk = skb->sk;
      
      		if (queue_index != new_index && sk &&
      		    sk_fullsock(sk) &&
      		    rcu_access_pointer(sk->sk_dst_cache))
      			sk_tx_queue_set(sk, new_index);
      
      	/* ... */
      
      	return queue_index;
      }
      
      This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
      where a few of the convert_ctx_access() in filter.c has already been
      accessing the skb->sk sock_common's fields,
      e.g. sock_ops_convert_ctx_access().
      
      "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
      Some of the fileds in "bpf_sock" will not be directly
      accessible through the "__sk_buff->sk" pointer.  It is limited
      by the new "bpf_sock_common_is_valid_access()".
      e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
           are not allowed.
      
      The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
      can be used to get a sk with all accessible fields in "bpf_sock".
      This helper is added to both cg_skb and sched_(cls|act).
      
      int cg_skb_foo(struct __sk_buff *skb) {
      	struct bpf_sock *sk;
      
      	sk = skb->sk;
      	if (!sk)
      		return 1;
      
      	sk = bpf_sk_fullsock(sk);
      	if (!sk)
      		return 1;
      
      	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
      		return 1;
      
      	/* some_traffic_shaping(); */
      
      	return 1;
      }
      
      (1) The sk is read only
      
      (2) There is no new "struct bpf_sock_common" introduced.
      
      (3) Future kernel sock's members could be added to bpf_sock only
          instead of repeatedly adding at multiple places like currently
          in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
      
      (4) After "sk = skb->sk", the reg holding sk is in type
          PTR_TO_SOCK_COMMON_OR_NULL.
      
      (5) After bpf_sk_fullsock(), the return type will be in type
          PTR_TO_SOCKET_OR_NULL which is the same as the return type of
          bpf_sk_lookup_xxx().
      
          However, bpf_sk_fullsock() does not take refcnt.  The
          acquire_reference_state() is only depending on the return type now.
          To avoid it, a new is_acquire_function() is checked before calling
          acquire_reference_state().
      
      (6) The WARN_ON in "release_reference_state()" is no longer an
          internal verifier bug.
      
          When reg->id is not found in state->refs[], it means the
          bpf_prog does something wrong like
          "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
          never been acquired by calling "bpf_sk_fullsock(skb->sk)".
      
          A -EINVAL and a verbose are done instead of WARN_ON.  A test is
          added to the test_verifier in a later patch.
      
          Since the WARN_ON in "release_reference_state()" is no longer
          needed, "__release_reference_state()" is folded into
          "release_reference_state()" also.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      46f8bc92
    • Martin KaFai Lau's avatar
      bpf: Fix narrow load on a bpf_sock returned from sk_lookup() · 5f456649
      Martin KaFai Lau authored
      By adding this test to test_verifier:
      {
      	"reference tracking: access sk->src_ip4 (narrow load)",
      	.insns = {
      	BPF_SK_LOOKUP,
      	BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
      	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
      	BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2),
      	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
      	BPF_EMIT_CALL(BPF_FUNC_sk_release),
      	BPF_EXIT_INSN(),
      	},
      	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
      	.result = ACCEPT,
      },
      
      The above test loads 2 bytes from sk->src_ip4 where
      sk is obtained by bpf_sk_lookup_tcp().
      
      It hits an internal verifier error from convert_ctx_accesses():
      [root@arch-fb-vm1 bpf]# ./test_verifier 665 665
      Failed to load prog 'Invalid argument'!
      0: (b7) r2 = 0
      1: (63) *(u32 *)(r10 -8) = r2
      2: (7b) *(u64 *)(r10 -16) = r2
      3: (7b) *(u64 *)(r10 -24) = r2
      4: (7b) *(u64 *)(r10 -32) = r2
      5: (7b) *(u64 *)(r10 -40) = r2
      6: (7b) *(u64 *)(r10 -48) = r2
      7: (bf) r2 = r10
      8: (07) r2 += -48
      9: (b7) r3 = 36
      10: (b7) r4 = 0
      11: (b7) r5 = 0
      12: (85) call bpf_sk_lookup_tcp#84
      13: (bf) r6 = r0
      14: (15) if r0 == 0x0 goto pc+3
       R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1
      15: (69) r2 = *(u16 *)(r0 +26)
      16: (bf) r1 = r6
      17: (85) call bpf_sk_release#86
      18: (95) exit
      
      from 14 to 18: safe
      processed 20 insns (limit 131072), stack depth 48
      bpf verifier is misconfigured
      Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
      
      The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly
      loaded (meaning load any 1 or 2 bytes of the src_ip4) by
      marking info->ctx_field_size.  However, this marked
      ctx_field_size is not used.  This patch fixes it.
      
      Due to the recent refactoring in test_verifier,
      this new test will be added to the bpf-next branch
      (together with the bpf_tcp_sock patchset)
      to avoid merge conflict.
      
      Fixes: c64b7983 ("bpf: Add PTR_TO_SOCKET verifier type")
      Cc: Joe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarJoe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5f456649
  7. 08 Feb, 2019 6 commits
    • Alexei Starovoitov's avatar
      Merge branch 'btf-api-extensions' · 28bbfc3a
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      This patchset introduces a set of new APIs that make it possible to work with BTF
      more effectively (and without involving kernel) for applications like pahole that
      need to manipulate .BTF and .BTF.ext data.
      
      Patch #1 changes existing btf__new() API call to only load and initialize
      struct btf, while exposing new btf__load() API to attempt to load and validate
      BTF in kernel.
      
      Patch #2 adds btf__get_raw_data() API allowing to get access to raw BTF data from
      struct btf.
      
      Patch #3 adds similar btf_ext__get_raw_data() API for working with struct btf_ext.
      
      Patch #4 removes not-yet-stable btf__get_strings() API which was added to be able
      to test contents of struct btf for btf__dedup(). It's now superseded by raw APIs.
      
      v3->v4:
      - formatting fixes
      - renamed btf_ext functions/structs to use "setup" language instead of "copy"
      - removed btf__get_strings from libbpf.map
      
      v2->v3:
      - const void* variants of btf__get_raw_data()
      - added btf_ext__get_raw_data()
      - removed btf__get_strings() and adapted test_btf.c to use btf__get_raw_data()
      
      v1->v2:
      - btf_load() returns just error, not fd
      - fix ordering in libbpf.map
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      28bbfc3a
    • Andrii Nakryiko's avatar
      tools/bpf: remove btf__get_strings() superseded by raw data API · 49b57e0d
      Andrii Nakryiko authored
      Now that we have btf__get_raw_data() it's trivial for tests to iterate
      over all strings for testing purposes, which eliminates the need for
      btf__get_strings() API.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      49b57e0d
    • Andrii Nakryiko's avatar
      btf: expose API to work with raw btf_ext data · ae4ab4b4
      Andrii Nakryiko authored
      This patch changes struct btf_ext to retain original data in sequential
      block of memory, which makes it possible to expose
      btf_ext__get_raw_data() interface similar to btf__get_raw_data(), allowing
      users of libbpf to get access to raw representation of .BTF.ext section.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ae4ab4b4
    • Andrii Nakryiko's avatar
      btf: expose API to work with raw btf data · 02c87446
      Andrii Nakryiko authored
      This patch exposes new API btf__get_raw_data() that allows to get a copy
      of raw BTF data out of struct btf. This is useful for external programs
      that need to manipulate raw data, e.g., pahole using btf__dedup() to
      deduplicate BTF type info and then writing it back to file.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      02c87446
    • Andrii Nakryiko's avatar
      btf: separate btf creation and loading · d29d87f7
      Andrii Nakryiko authored
      This change splits out previous btf__new functionality of constructing
      struct btf and loading it into kernel into two:
      - btf__new() just creates and initializes struct btf
      - btf__load() attempts to load existing struct btf into kernel
      
      btf__free will still close BTF fd, if it was ever loaded successfully
      into kernel.
      
      This change allows users of libbpf to manipulate BTF using its API,
      without the need to unnecessarily load it into kernel.
      
      One of the intended use cases is pahole, which will do DWARF to BTF
      conversion and then use libbpf to do type deduplication, while then
      handling ELF sections overwriting and other concerns on its own.
      
      Fixes: 2d3feca8 ("bpf: btf: print map dump and lookup with btf info")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d29d87f7
    • Yonghong Song's avatar
      tools/bpf: add log_level to bpf_load_program_attr · a4021a35
      Yonghong Song authored
      The kernel verifier has three levels of logs:
          0: no logs
          1: logs mostly useful
        > 1: verbose
      
      Current libbpf API functions bpf_load_program_xattr() and
      bpf_load_program() cannot specify log_level.
      The bcc, however, provides an interface for user to
      specify log_level 2 for verbose output.
      
      This patch added log_level into structure
      bpf_load_program_attr, so users, including bcc, can use
      bpf_load_program_xattr() to change log_level. The
      supported log_level is 0, 1, and 2.
      
      The bpf selftest test_sock.c is modified to enable log_level = 2.
      If the "verbose" in test_sock.c is changed to true,
      the test will output logs like below:
        $ ./test_sock
        func#0 @0
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
        0: (bf) r6 = r1
        1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
        1: (61) r7 = *(u32 *)(r6 +28)
        invalid bpf_context access off=28 size=4
      
        Test case: bind4 load with invalid access: src_ip6 .. [PASS]
        ...
        Test case: bind6 allow all .. [PASS]
        Summary: 16 PASSED, 0 FAILED
      
      Some test_sock tests are negative tests and verbose verifier
      log will be printed out as shown in the above.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a4021a35