1. 30 Jul, 2021 18 commits
  2. 29 Jul, 2021 2 commits
    • Yonghong Song's avatar
      bpf: Emit better log message if bpf_iter ctx arg btf_id == 0 · d3621642
      Yonghong Song authored
      To avoid kernel build failure due to some missing .BTF-ids referenced
      functions/types, the patch ([1]) tries to fill btf_id 0 for
      these types.
      
      In bpf verifier, for percpu variable and helper returning btf_id cases,
      verifier already emitted proper warning with something like
        verbose(env, "Helper has invalid btf_id in R%d\n", regno);
        verbose(env, "invalid return type %d of func %s#%d\n",
                fn->ret_type, func_id_name(func_id), func_id);
      
      But this is not the case for bpf_iter context arguments.
      I hacked resolve_btfids to encode btf_id 0 for struct task_struct.
      With `./test_progs -n 7/5`, I got,
        0: (79) r2 = *(u64 *)(r1 +0)
        func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta'
        ; struct seq_file *seq = ctx->meta->seq;
        1: (79) r6 = *(u64 *)(r2 +0)
        ; struct task_struct *task = ctx->task;
        2: (79) r7 = *(u64 *)(r1 +8)
        ; if (task == (void *)0) {
        3: (55) if r7 != 0x0 goto pc+11
        ...
        ; BPF_SEQ_PRINTF(seq, "%8d %8d\n", task->tgid, task->pid);
        26: (61) r1 = *(u32 *)(r7 +1372)
        Type '(anon)' is not a struct
      
      Basically, verifier will return btf_id 0 for task_struct.
      Later on, when the code tries to access task->tgid, the
      verifier correctly complains the type is '(anon)' and it is
      not a struct. Users still need to backtrace to find out
      what is going on.
      
      Let us catch the invalid btf_id 0 earlier
      and provide better message indicating btf_id is wrong.
      The new error message looks like below:
        R1 type=ctx expected=fp
        ; struct seq_file *seq = ctx->meta->seq;
        0: (79) r2 = *(u64 *)(r1 +0)
        func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta'
        ; struct seq_file *seq = ctx->meta->seq;
        1: (79) r6 = *(u64 *)(r2 +0)
        ; struct task_struct *task = ctx->task;
        2: (79) r7 = *(u64 *)(r1 +8)
        invalid btf_id for context argument offset 8
        invalid bpf_context access off=8 size=8
      
      [1] https://lore.kernel.org/bpf/20210727132532.2473636-1-hengqi.chen@gmail.com/Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210728183025.1461750-1-yhs@fb.com
      d3621642
    • Hengqi Chen's avatar
      tools/resolve_btfids: Emit warnings and patch zero id for missing symbols · 5aad0368
      Hengqi Chen authored
      Kernel functions referenced by .BTF_ids may be changed from global to static
      and get inlined or get renamed/removed, and thus disappears from BTF.
      This causes kernel build failure when resolve_btfids do id patch for symbols
      in .BTF_ids in vmlinux. Update resolve_btfids to emit warning messages and
      patch zero id for missing symbols instead of aborting kernel build process.
      Suggested-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarHengqi Chen <hengqi.chen@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210727132532.2473636-2-hengqi.chen@gmail.com
      5aad0368
  3. 27 Jul, 2021 5 commits
  4. 26 Jul, 2021 6 commits
  5. 24 Jul, 2021 3 commits
  6. 23 Jul, 2021 6 commits
    • Evgeniy Litvinenko's avatar
      libbpf: Add bpf_map__pin_path function · e244d34d
      Evgeniy Litvinenko authored
      Add bpf_map__pin_path, so that the inconsistently named
      bpf_map__get_pin_path can be deprecated later. This is part of the
      effort towards libbpf v1.0: https://github.com/libbpf/libbpf/issues/307
      
      Also, add a selftest for the new function.
      Signed-off-by: default avatarEvgeniy Litvinenko <evgeniyl@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210723221511.803683-1-evgeniyl@fb.com
      e244d34d
    • Andrii Nakryiko's avatar
      Merge branch 'bpf: Allow bpf tcp iter to do bpf_(get|set)sockopt' · d9e8d14b
      Andrii Nakryiko authored
      Martin KaFai says:
      
      ====================
      
      This set is to allow bpf tcp iter to call bpf_(get|set)sockopt.
      
      With bpf-tcp-cc, new algo rollout happens more often.  Instead of
      restarting the applications to pick up the new tcp-cc, this set
      allows the bpf tcp iter to call bpf_(get|set)sockopt(TCP_CONGESTION).
      It is not limited to TCP_CONGESTION, the bpf tcp iter can call
      bpf_(get|set)sockopt() with other options.  The bpf tcp iter can read
      into all the fields of a tcp_sock, so there is a lot of flexibility
      to select the desired sk to do setsockopt(), e.g. it can test for
      TCP_LISTEN only and leave the established connections untouched,
      or check the addr/port, or check the current tcp-cc name, ...etc.
      
      Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file.
      
      Patch 5 is to have the tcp seq_file iterate on the
      port+addr lhash2 instead of the port only listening_hash.
      
      Patch 6 is to have the bpf tcp iter doing batching which
      then allows lock_sock.  lock_sock is needed for setsockopt.
      
      Patch 7 allows the bpf tcp iter to call bpf_(get|set)sockopt.
      
      v2:
      - Use __GFP_NOWARN in patch 6
      - Add bpf_getsockopt() in patch 7 to give a symmetrical user experience.
        selftest in patch 8 is changed to also cover bpf_getsockopt().
      - Remove CAP_NET_ADMIN check in patch 7. Tracing bpf prog has already
        required CAP_SYS_ADMIN or CAP_PERFMON.
      - Move some def macros to bpf_tracing_net.h in patch 8
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      d9e8d14b
    • Martin KaFai Lau's avatar
      bpf: selftest: Test batching and bpf_(get|set)sockopt in bpf tcp iter · eed92afd
      Martin KaFai Lau authored
      This patch adds tests for the batching and bpf_(get|set)sockopt in
      bpf tcp iter.
      
      It first creates:
      a) 1 non SO_REUSEPORT listener in lhash2.
      b) 256 passive and active fds connected to the listener in (a).
      c) 256 SO_REUSEPORT listeners in one of the lhash2 bucket.
      
      The test sets all listeners and connections to bpf_cubic before
      running the bpf iter.
      
      The bpf iter then calls setsockopt(TCP_CONGESTION) to switch
      each listener and connection from bpf_cubic to bpf_dctcp.
      
      The bpf iter has a random_retry mode such that it can return EAGAIN
      to the usespace in the middle of a batch.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210701200625.1036874-1-kafai@fb.com
      eed92afd
    • Martin KaFai Lau's avatar
      bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iter · 3cee6fb8
      Martin KaFai Lau authored
      This patch allows bpf tcp iter to call bpf_(get|set)sockopt.
      To allow a specific bpf iter (tcp here) to call a set of helpers,
      get_func_proto function pointer is added to bpf_iter_reg.
      The bpf iter is a tracing prog which currently requires
      CAP_PERFMON or CAP_SYS_ADMIN, so this patch does not
      impose other capability checks for bpf_(get|set)sockopt.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210701200619.1036715-1-kafai@fb.com
      3cee6fb8
    • Martin KaFai Lau's avatar
      bpf: tcp: Bpf iter batching and lock_sock · 04c7820b
      Martin KaFai Lau authored
      This patch does batching and lock_sock for the bpf tcp iter.
      It does not affect the proc fs iteration.
      
      With bpf-tcp-cc, new algo rollout happens more often.  Instead of
      restarting the application to pick up the new tcp-cc, the next patch
      will allow bpf iter to do setsockopt(TCP_CONGESTION).  This requires
      locking the sock.
      
      Also, unlike the proc iteration (cat /proc/net/tcp[6]), the bpf iter
      can inspect all fields of a tcp_sock.  It will be useful to have a
      consistent view on some of the fields (e.g. the ones reported in
      tcp_get_info() that also acquires the sock lock).
      
      Double lock: locking the bucket first and then locking the sock could
      lead to deadlock.  This patch takes a batching approach similar to
      inet_diag.  While holding the bucket lock, it batch a number of sockets
      into an array first and then unlock the bucket.  Before doing show(),
      it then calls lock_sock_fast().
      
      In a machine with ~400k connections, the maximum number of
      sk in a bucket of the established hashtable is 7.  0.02% of
      the established connections fall into this bucket size.
      
      For listen hash (port+addr lhash2), the bucket is usually very
      small also except for the SO_REUSEPORT use case which the
      userspace could have one SO_REUSEPORT socket per thread.
      
      While batching is used, it can also minimize the chance of missing
      sock in the setsockopt use case if the whole bucket is batched.
      This patch will start with a batch array with INIT_BATCH_SZ (16)
      which will be enough for the most common cases.  bpf_iter_tcp_batch()
      will try to realloc to a larger array to handle exception case (e.g.
      the SO_REUSEPORT case in the lhash2).
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com
      04c7820b
    • Martin KaFai Lau's avatar
      tcp: seq_file: Replace listening_hash with lhash2 · 05c0b357
      Martin KaFai Lau authored
      This patch moves the tcp seq_file iteration on listeners
      from the port only listening_hash to the port+addr lhash2.
      
      When iterating from the bpf iter, the next patch will need to
      lock the socket such that the bpf iter can call setsockopt (e.g. to
      change the TCP_CONGESTION).  To avoid locking the bucket and then locking
      the sock, the bpf iter will first batch some sockets from the same bucket
      and then unlock the bucket.  If the bucket size is small (which
      usually is), it is easier to batch the whole bucket such that it is less
      likely to miss a setsockopt on a socket due to changes in the bucket.
      
      However, the port only listening_hash could have many listeners
      hashed to a bucket (e.g. many individual VIP(s):443 and also
      multiple by the number of SO_REUSEPORT).  We have seen bucket size in
      tens of thousands range.  Also, the chance of having changes
      in some popular port buckets (e.g. 443) is also high.
      
      The port+addr lhash2 was introduced to solve this large listener bucket
      issue.  Also, the listening_hash usage has already been replaced with
      lhash2 in the fast path inet[6]_lookup_listener().  This patch follows
      the same direction on moving to lhash2 and iterates the lhash2
      instead of listening_hash.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210701200606.1035783-1-kafai@fb.com
      05c0b357