1. 27 Apr, 2019 13 commits
    • Martin KaFai Lau's avatar
      bpf: Add verifier tests for the bpf_sk_storage · 7a9bb976
      Martin KaFai Lau authored
      This patch adds verifier tests for the bpf_sk_storage:
      1. ARG_PTR_TO_MAP_VALUE_OR_NULL
      2. Map and helper compatibility (e.g. disallow bpf_map_loookup_elem)
      
      It also takes this chance to remove the unused struct btf_raw_data
      and uses the BTF encoding macros from "test_btf.h".
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7a9bb976
    • Martin KaFai Lau's avatar
      bpf: Refactor BTF encoding macro to test_btf.h · 3f4d4c74
      Martin KaFai Lau authored
      Refactor common BTF encoding macros for other tests to use.
      The libbpf may reuse some of them in the future  which requires
      some more thoughts before publishing as a libbpf API.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3f4d4c74
    • Martin KaFai Lau's avatar
      bpf: Support BPF_MAP_TYPE_SK_STORAGE in bpf map probing · a19f89f3
      Martin KaFai Lau authored
      This patch supports probing for the new BPF_MAP_TYPE_SK_STORAGE.
      BPF_MAP_TYPE_SK_STORAGE enforces BTF usage, so the new probe
      requires to create and load a BTF also.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a19f89f3
    • Martin KaFai Lau's avatar
      bpf: Sync bpf.h to tools · 948d930e
      Martin KaFai Lau authored
      This patch sync the bpf.h to tools/.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      948d930e
    • Martin KaFai Lau's avatar
      bpf: Introduce bpf sk local storage · 6ac99e8f
      Martin KaFai Lau authored
      After allowing a bpf prog to
      - directly read the skb->sk ptr
      - get the fullsock bpf_sock by "bpf_sk_fullsock()"
      - get the bpf_tcp_sock by "bpf_tcp_sock()"
      - get the listener sock by "bpf_get_listener_sock()"
      - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock"
        into different bpf running context.
      
      this patch is another effort to make bpf's network programming
      more intuitive to do (together with memory and performance benefit).
      
      When bpf prog needs to store data for a sk, the current practice is to
      define a map with the usual 4-tuples (src/dst ip/port) as the key.
      If multiple bpf progs require to store different sk data, multiple maps
      have to be defined.  Hence, wasting memory to store the duplicated
      keys (i.e. 4 tuples here) in each of the bpf map.
      [ The smallest key could be the sk pointer itself which requires
        some enhancement in the verifier and it is a separate topic. ]
      
      Also, the bpf prog needs to clean up the elem when sk is freed.
      Otherwise, the bpf map will become full and un-usable quickly.
      The sk-free tracking currently could be done during sk state
      transition (e.g. BPF_SOCK_OPS_STATE_CB).
      
      The size of the map needs to be predefined which then usually ended-up
      with an over-provisioned map in production.  Even the map was re-sizable,
      while the sk naturally come and go away already, this potential re-size
      operation is arguably redundant if the data can be directly connected
      to the sk itself instead of proxy-ing through a bpf map.
      
      This patch introduces sk->sk_bpf_storage to provide local storage space
      at sk for bpf prog to use.  The space will be allocated when the first bpf
      prog has created data for this particular sk.
      
      The design optimizes the bpf prog's lookup (and then optionally followed by
      an inline update).  bpf_spin_lock should be used if the inline update needs
      to be protected.
      
      BPF_MAP_TYPE_SK_STORAGE:
      -----------------------
      To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in
      this patch) needs to be created.  Multiple BPF_MAP_TYPE_SK_STORAGE maps can
      be created to fit different bpf progs' needs.  The map enforces
      BTF to allow printing the sk-local-storage during a system-wise
      sk dump (e.g. "ss -ta") in the future.
      
      The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete
      a "sk-local-storage" data from a particular sk.
      Think of the map as a meta-data (or "type") of a "sk-local-storage".  This
      particular "type" of "sk-local-storage" data can then be stored in any sk.
      
      The main purposes of this map are mostly:
      1. Define the size of a "sk-local-storage" type.
      2. Provide a similar syscall userspace API as the map (e.g. lookup/update,
         map-id, map-btf...etc.)
      3. Keep track of all sk's storages of this "type" and clean them up
         when the map is freed.
      
      sk->sk_bpf_storage:
      ------------------
      The main lookup/update/delete is done on sk->sk_bpf_storage (which
      is a "struct bpf_sk_storage").  When doing a lookup,
      the "map" pointer is now used as the "key" to search on the
      sk_storage->list.  The "map" pointer is actually serving
      as the "type" of the "sk-local-storage" that is being
      requested.
      
      To allow very fast lookup, it should be as fast as looking up an
      array at a stable-offset.  At the same time, it is not ideal to
      set a hard limit on the number of sk-local-storage "type" that the
      system can have.  Hence, this patch takes a cache approach.
      The last search result from sk_storage->list is cached in
      sk_storage->cache[] which is a stable sized array.  Each
      "sk-local-storage" type has a stable offset to the cache[] array.
      In the future, a map's flag could be introduced to do cache
      opt-out/enforcement if it became necessary.
      
      The cache size is 16 (i.e. 16 types of "sk-local-storage").
      Programs can share map.  On the program side, having a few bpf_progs
      running in the networking hotpath is already a lot.  The bpf_prog
      should have already consolidated the existing sock-key-ed map usage
      to minimize the map lookup penalty.  16 has enough runway to grow.
      
      All sk-local-storage data will be removed from sk->sk_bpf_storage
      during sk destruction.
      
      bpf_sk_storage_get() and bpf_sk_storage_delete():
      ------------------------------------------------
      Instead of using bpf_map_(lookup|update|delete)_elem(),
      the bpf prog needs to use the new helper bpf_sk_storage_get() and
      bpf_sk_storage_delete().  The verifier can then enforce the
      ARG_PTR_TO_SOCKET argument.  The bpf_sk_storage_get() also allows to
      "create" new elem if one does not exist in the sk.  It is done by
      the new BPF_SK_STORAGE_GET_F_CREATE flag.  An optional value can also be
      provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE.
      The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock.  Together,
      it has eliminated the potential use cases for an equivalent
      bpf_map_update_elem() API (for bpf_prog) in this patch.
      
      Misc notes:
      ----------
      1. map_get_next_key is not supported.  From the userspace syscall
         perspective,  the map has the socket fd as the key while the map
         can be shared by pinned-file or map-id.
      
         Since btf is enforced, the existing "ss" could be enhanced to pretty
         print the local-storage.
      
         Supporting a kernel defined btf with 4 tuples as the return key could
         be explored later also.
      
      2. The sk->sk_lock cannot be acquired.  Atomic operations is used instead.
         e.g. cmpxchg is done on the sk->sk_bpf_storage ptr.
         Please refer to the source code comments for the details in
         synchronization cases and considerations.
      
      3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.
      
      Benchmark:
      ---------
      Here is the benchmark data collected by turning on
      the "kernel.bpf_stats_enabled" sysctl.
      Two bpf progs are tested:
      
      One bpf prog with the usual bpf hashmap (max_entries = 8192) with the
      sk ptr as the key. (verifier is modified to support sk ptr as the key
      That should have shortened the key lookup time.)
      
      Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.
      
      Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for
      each egress skb and then bump the cnt.  netperf is used to drive
      data with 4096 connected UDP sockets.
      
      BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run)
      27: cgroup_skb  name egress_sk_map  tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633
          loaded_at 2019-04-15T13:46:39-0700  uid 0
          xlated 344B  jited 258B  memlock 4096B  map_ids 16
          btf_id 5
      
      BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run)
      30: cgroup_skb  name egress_sk_stora  tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739
          loaded_at 2019-04-15T13:47:54-0700  uid 0
          xlated 168B  jited 156B  memlock 4096B  map_ids 17
          btf_id 6
      
      Here is a high-level picture on how are the objects organized:
      
             sk
          ┌──────┐
          │      │
          │      │
          │      │
          │*sk_bpf_storage───── bpf_sk_storage
          └──────┘                 ┌───────┐
                       ┌───────────┤ list  │
                       │           │       │
                       │           │       │
                       │           │       │
                       │           └───────┘
                       │
                       │     elem
                       │  ┌────────┐
                       ├─│ snode  │
                       │  ├────────┤
                       │  │  data  │          bpf_map
                       │  ├────────┤        ┌─────────┐
                       │  │map_node│─┬─────┤  list   │
                       │  └────────┘  │     │         │
                       │              │     │         │
                       │     elem     │     │         │
                       │  ┌────────┐  │     └─────────┘
                       └─│ snode  │  │
                          ├────────┤  │
         bpf_map          │  data  │  │
       ┌─────────┐        ├────────┤  │
       │  list   ├───────│map_node│  │
       │         │        └────────┘  │
       │         │                    │
       │         │           elem     │
       └─────────┘        ┌────────┐  │
                       ┌─│ snode  │  │
                       │  ├────────┤  │
                       │  │  data  │  │
                       │  ├────────┤  │
                       │  │map_node│─┘
                       │  └────────┘
                       │
                       │
                       │          ┌───────┐
           sk          └──────────│ list  │
        ┌──────┐                  │       │
        │      │                  │       │
        │      │                  │       │
        │      │                  └───────┘
        │*sk_bpf_storage───────bpf_sk_storage
        └──────┘
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6ac99e8f
    • Alexei Starovoitov's avatar
      Merge branch 'writeable-bpf-tracepoints' · 3745dc24
      Alexei Starovoitov authored
      Matt Mullins says:
      
      ====================
      This adds an opt-in interface for tracepoints to expose a writable context to
      BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that are attached, while
      supporting read-only access from existing BPF_PROG_TYPE_RAW_TRACEPOINT
      programs, as well as from non-BPF-based tracepoints.
      
      The initial motivation is to support tracing that can be observed from the
      remote end of an NBD socket, e.g. by adding flags to the struct nbd_request
      header.  Earlier attempts included adding an NBD-specific tracepoint fd, but in
      code review, I was recommended to implement it more generically -- as a result,
      this patchset is far simpler than my initial try.
      
      v4->v5:
        * rebased onto bpf-next/master and fixed merge conflicts
        * "tools: sync bpf.h" also syncs comments that have previously changed
          in bpf-next
      
      v3->v4:
        * fixed a silly copy/paste typo in include/trace/events/bpf_test_run.h
          (_TRACE_NBD_H -> _TRACE_BPF_TEST_RUN_H)
        * fixed incorrect/misleading wording in patch 1's commit message,
          since the pointer cannot be directly dereferenced in a
          BPF_PROG_TYPE_RAW_TRACEPOINT
        * cleaned up the error message wording if the prog_tests fail
        * Addressed feedback from Yonghong
          * reject non-pointer-sized accesses to the buffer pointer
          * use sizeof(struct nbd_request) as one-byte-past-the-end in
            raw_tp_writable_reject_nbd_invalid.c
          * use BPF_MOV64_IMM instead of BPF_LD_IMM64
      
      v2->v3:
        * Andrew addressed Josef's comments:
          * C-style commenting in nbd.c
          * Collapsed identical events into a single DECLARE_EVENT_CLASS.
            This saves about 2kB of kernel text
      
      v1->v2:
        * add selftests
          * sync tools/include/uapi/linux/bpf.h
        * reject variable offset into the buffer
        * add string representation of PTR_TO_TP_BUFFER to reg_type_str
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3745dc24
    • Matt Mullins's avatar
      selftests: bpf: test writable buffers in raw tps · e950e843
      Matt Mullins authored
      This tests that:
        * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
          uses either:
          * a variable offset to the tracepoint buffer, or
          * an offset beyond the size of the tracepoint buffer
        * a tracer can modify the buffer provided when attached to a writable
          tracepoint in bpf_prog_test_run
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e950e843
    • Matt Mullins's avatar
      tools: sync bpf.h · 4635b0ae
      Matt Mullins authored
      This adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, and fixes up the
      
      	error: enumeration value ‘BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE’ not handled in switch [-Werror=switch-enum]
      
      build errors it would otherwise cause in libbpf.
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4635b0ae
    • Andrew Hall's avatar
      nbd: add tracepoints for send/receive timing · 2abd2de7
      Andrew Hall authored
      This adds four tracepoints to nbd, enabling separate tracing of payload
      and header sending/receipt.
      
      In the send path for headers that have already been sent, we also
      explicitly initialize the handle so it can be referenced by the later
      tracepoint.
      Signed-off-by: default avatarAndrew Hall <hall@fb.com>
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2abd2de7
    • Matt Mullins's avatar
      nbd: trace sending nbd requests · ea106722
      Matt Mullins authored
      This adds a tracepoint that can both observe the nbd request being sent
      to the server, as well as modify that request , e.g., setting a flag in
      the request that will cause the server to collect detailed tracing data.
      
      The struct request * being handled is included to permit correlation
      with the block tracepoints.
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ea106722
    • Matt Mullins's avatar
      bpf: add writable context for raw tracepoints · 9df1c28b
      Matt Mullins authored
      This is an opt-in interface that allows a tracepoint to provide a safe
      buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
      The size of the buffer must be a compile-time constant, and is checked
      before allowing a BPF program to attach to a tracepoint that uses this
      feature.
      
      The pointer to this buffer will be the first argument of tracepoints
      that opt in; the pointer is valid and can be bpf_probe_read() by both
      BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
      programs that attach to such a tracepoint, but the buffer to which it
      points may only be written by the latter.
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9df1c28b
    • Daniel Borkmann's avatar
      bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd · 34b8ab09
      Daniel Borkmann authored
      Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
      lets add support for STADD and use that in favor of LDXR / STXR loop for
      the XADD mapping if available. STADD is encoded as an alias for LDADD with
      XZR as the destination register, therefore add LDADD to the instruction
      encoder along with STADD as special case and use it in the JIT for CPUs
      that advertise LSE atomics in CPUID register. If immediate offset in the
      BPF XADD insn is 0, then use dst register directly instead of temporary
      one.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      34b8ab09
    • Daniel Borkmann's avatar
      bpf, arm64: remove prefetch insn in xadd mapping · 8968c67a
      Daniel Borkmann authored
      Prefetch-with-intent-to-write is currently part of the XADD mapping in
      the AArch64 JIT and follows the kernel's implementation of atomic_add.
      This may interfere with other threads executing the LDXR/STXR loop,
      leading to potential starvation and fairness issues. Drop the optional
      prefetch instruction.
      
      Fixes: 85f68fe8 ("bpf, arm64: implement jiting of BPF_XADD")
      Reported-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8968c67a
  2. 26 Apr, 2019 6 commits
    • Alexei Starovoitov's avatar
      Merge branch 'btf-dump' · 0c0cad2c
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      This patch set adds a new `bpftool btf dump` sub-command, which allows to dump
      BTF contents (only types for now). Currently it only outputs low-level
      content, almost 1:1 with binary BTF format, but follow up patches will add
      ability to dump BTF types as a compilable C header file. JSON output is
      supported as well.
      
      Patch #1 adds `btf` sub-command, dumping BTF types in human-readable format.
      It also implements reading .BTF data from ELF file.
      Patch #2 adds minimal documentation with output format examples and different
      ways to specify source of BTF data.
      Patch #3 adds support for btf command in bash-completion/bpftool script.
      Patch #4 fixes minor indentation issue in bash-completion script.
      
      Output format is mostly following existing format of BPF verifier log, but
      deviates from it in few places. More details are in commit message for patch 1.
      
      Example of output for all supported BTF kinds are in patch #2 as part of
      documentation. Some field names are quite verbose and I'd rather shorten them,
      if we don't feel like being very close to BPF verifier names is a necessity,
      but in this patch I left them exactly the same as in verifier log.
      
      v3->v4:
        - reverse Christmas tree (Quentin)
        - better docs (Quentin)
      
      v2->v3:
        - make map's key|value|kv|all suggestion more precise (Quentin)
        - fix default case indentations (Quentin)
      
      v1->v2:
        - fix unnecessary trailing whitespaces in bpftool-btf.rst (Yonghong)
        - add btf in main.c for a list of possible OBJECTs
        - handle unknown keyword under `bpftool btf dump` (Yonghong)
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0c0cad2c
    • Andrii Nakryiko's avatar
      bpftool: fix indendation in bash-completion/bpftool · 8ed1875b
      Andrii Nakryiko authored
      Fix misaligned default case branch for `prog dump` sub-command.
      Reported-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Cc: Yonghong Song <yhs@fb.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8ed1875b
    • Andrii Nakryiko's avatar
      bpftool: add bash completions for btf command · 4a714fee
      Andrii Nakryiko authored
      Add full support for btf command in bash-completion script.
      
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4a714fee
    • Andrii Nakryiko's avatar
      bpftool/docs: add btf sub-command documentation · ca253339
      Andrii Nakryiko authored
      Document usage and sample output format for `btf dump` sub-command.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ca253339
    • Andrii Nakryiko's avatar
      bpftool: add ability to dump BTF types · c93cc690
      Andrii Nakryiko authored
      Add new `btf dump` sub-command to bpftool. It allows to dump
      human-readable low-level BTF types representation of BTF types. BTF can
      be retrieved from few different sources:
        - from BTF object by ID;
        - from PROG, if it has associated BTF;
        - from MAP, if it has associated BTF data; it's possible to narrow
          down types to either key type, value type, both, or all BTF types;
        - from ELF file (.BTF section).
      
      Output format mostly follows BPF verifier log format with few notable
      exceptions:
        - all the type/field/param/etc names are enclosed in single quotes to
          allow easier grepping and to stand out a little bit more;
        - FUNC_PROTO output follows STRUCT/UNION/ENUM format of having one
          line per each argument; this is more uniform and allows easy
          grepping, as opposed to succinct, but inconvenient format that BPF
          verifier log is using.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c93cc690
    • Benjamin Poirier's avatar
      bpftool: Fix errno variable usage · 77d76426
      Benjamin Poirier authored
      The test meant to use the saved value of errno. Given the current code, it
      makes no practical difference however.
      
      Fixes: bf598a8f ("bpftool: Improve handling of ENOENT on map dumps")
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      77d76426
  3. 25 Apr, 2019 7 commits
    • Stanislav Fomichev's avatar
      bpftool: show flow_dissector attachment status · 7f0c57fe
      Stanislav Fomichev authored
      Right now there is no way to query whether BPF flow_dissector program
      is attached to a network namespace or not. In previous commit, I added
      support for querying that info, show it when doing `bpftool net`:
      
      $ bpftool prog loadall ./bpf_flow.o \
      	/sys/fs/bpf/flow type flow_dissector \
      	pinmaps /sys/fs/bpf/flow
      $ bpftool prog
      3: flow_dissector  name _dissect  tag 8c9e917b513dd5cc  gpl
              loaded_at 2019-04-23T16:14:48-0700  uid 0
              xlated 656B  jited 461B  memlock 4096B  map_ids 1,2
              btf_id 1
      ...
      
      $ bpftool net -j
      [{"xdp":[],"tc":[],"flow_dissector":[]}]
      
      $ bpftool prog attach pinned \
      	/sys/fs/bpf/flow/flow_dissector flow_dissector
      $ bpftool net -j
      [{"xdp":[],"tc":[],"flow_dissector":["id":3]}]
      
      Doesn't show up in a different net namespace:
      $ ip netns add test
      $ ip netns exec test bpftool net -j
      [{"xdp":[],"tc":[],"flow_dissector":[]}]
      
      Non-json output:
      $ bpftool net
      xdp:
      
      tc:
      
      flow_dissector:
      id 3
      
      v2:
      * initialization order (Jakub Kicinski)
      * clear errno for batch mode (Quentin Monnet)
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7f0c57fe
    • Stanislav Fomichev's avatar
      bpf: support BPF_PROG_QUERY for BPF_FLOW_DISSECTOR attach_type · 118c8e9a
      Stanislav Fomichev authored
      target_fd is target namespace. If there is a flow dissector BPF program
      attached to that namespace, its (single) id is returned.
      
      v5:
      * drop net ref right after rcu unlock (Daniel Borkmann)
      
      v4:
      * add missing put_net (Jann Horn)
      
      v3:
      * add missing inline to skb_flow_dissector_prog_query static def
        (kbuild test robot <lkp@intel.com>)
      
      v2:
      * don't sleep in rcu critical section (Jakub Kicinski)
      * check input prog_cnt (exit early)
      
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      118c8e9a
    • Daniel T. Lee's avatar
      samples: bpf: add hbm sample to .gitignore · ead442a0
      Daniel T. Lee authored
      This commit adds hbm to .gitignore which is
      currently ommited from the ignore file.
      Signed-off-by: default avatarDaniel T. Lee <danieltimlee@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ead442a0
    • Daniel T. Lee's avatar
      libbpf: fix samples/bpf build failure due to undefined UINT32_MAX · 32e621e5
      Daniel T. Lee authored
      Currently, building bpf samples will cause the following error.
      
          ./tools/lib/bpf/bpf.h:132:27: error: 'UINT32_MAX' undeclared here (not in a function) ..
           #define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */
                                     ^
          ./samples/bpf/bpf_load.h:31:25: note: in expansion of macro 'BPF_LOG_BUF_SIZE'
           extern char bpf_log_buf[BPF_LOG_BUF_SIZE];
                                   ^~~~~~~~~~~~~~~~
      
      Due to commit 4519efa6 ("libbpf: fix BPF_LOG_BUF_SIZE off-by-one error")
      hard-coded size of BPF_LOG_BUF_SIZE has been replaced with UINT32_MAX which is
      defined in <stdint.h> header.
      
      Even with this change, bpf selftests are running fine since these are built
      with clang and it includes header(-idirafter) from clang/6.0.0/include.
      (it has <stdint.h>)
      
          clang -I. -I./include/uapi -I../../../include/uapi -idirafter /usr/local/include -idirafter /usr/include \
          -idirafter /usr/lib/llvm-6.0/lib/clang/6.0.0/include -idirafter /usr/include/x86_64-linux-gnu \
          -Wno-compare-distinct-pointer-types -O2 -target bpf -emit-llvm -c progs/test_sysctl_prog.c -o - | \
          llc -march=bpf -mcpu=generic  -filetype=obj -o /linux/tools/testing/selftests/bpf/test_sysctl_prog.o
      
      But bpf samples are compiled with GCC, and it only searches and includes
      headers declared at the target file. As '#include <stdint.h>' hasn't been
      declared in tools/lib/bpf/bpf.h, it causes build failure of bpf samples.
      
          gcc -Wp,-MD,./samples/bpf/.sockex3_user.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes \
          -O2 -fomit-frame-pointer -std=gnu89 -I./usr/include -I./tools/lib/ -I./tools/testing/selftests/bpf/ \
          -I./tools/  lib/ -I./tools/include -I./tools/perf -c -o ./samples/bpf/sockex3_user.o ./samples/bpf/sockex3_user.c;
      
      This commit add declaration of '#include <stdint.h>' to tools/lib/bpf/bpf.h
      to fix this problem.
      Signed-off-by: default avatarDaniel T. Lee <danieltimlee@gmail.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      32e621e5
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf-fixes' · 0e33d334
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      Two small fixes in relation to global data handling. Thanks!
      ====================
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0e33d334
    • Daniel Borkmann's avatar
      bpf, libbpf: fix segfault in bpf_object__init_maps' pr_debug statement · 4f8827d2
      Daniel Borkmann authored
      Ran into it while testing; in bpf_object__init_maps() data can be NULL
      in the case where no map section is present. Therefore we simply cannot
      access data->d_size before NULL test. Move the pr_debug() where it's
      safe to access.
      
      Fixes: d859900c ("bpf, libbpf: support global data/bss/rodata sections")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4f8827d2
    • Daniel Borkmann's avatar
      bpf, libbpf: handle old kernels more graceful wrt global data sections · 8837fe5d
      Daniel Borkmann authored
      Andrii reported a corner case where e.g. global static data is present
      in the BPF ELF file in form of .data/.bss/.rodata section, but without
      any relocations to it. Such programs could be loaded before commit
      d859900c ("bpf, libbpf: support global data/bss/rodata sections"),
      whereas afterwards if kernel lacks support then loading would fail.
      
      Add a probing mechanism which skips setting up libbpf internal maps
      in case of missing kernel support. In presence of relocation entries,
      we abort the load attempt.
      
      Fixes: d859900c ("bpf, libbpf: support global data/bss/rodata sections")
      Reported-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8837fe5d
  4. 23 Apr, 2019 14 commits