1. 28 Feb, 2020 6 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf_sk_storage_via_inet_diag' · 812285fa
      Alexei Starovoitov authored
      Martin KaFai Lau says:
      
      ====================
      The bpf_prog can store specific info to a sk by using bpf_sk_storage.
      In other words, a sk can be extended by a bpf_prog.
      
      This series is to support providing bpf_sk_storage data during inet_diag's
      dump.  The primary target is the usage like iproute2's "ss".
      
      The first two patches are refactoring works in inet_diag to make
      adding bpf_sk_storage support easier.  The next two patches do
      the actual work.
      
      Please see individual patch for details.
      
      v2:
      - Add commit message for u16 to u32 change in min_dump_alloc in Patch 4 (Song)
      - Add comment to explain the !skb->len check in __inet_diag_dump in Patch 4.
      - Do the map->map_type check earlier in Patch 3 for readability.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      812285fa
    • Martin KaFai Lau's avatar
      bpf: inet_diag: Dump bpf_sk_storages in inet_diag_dump() · 085c20ca
      Martin KaFai Lau authored
      This patch will dump out the bpf_sk_storages of a sk
      if the request has the INET_DIAG_REQ_SK_BPF_STORAGES nlattr.
      
      An array of SK_DIAG_BPF_STORAGE_REQ_MAP_FD can be specified in
      INET_DIAG_REQ_SK_BPF_STORAGES to select which bpf_sk_storage to dump.
      If no map_fd is specified, all bpf_sk_storages of a sk will be dumped.
      
      bpf_sk_storages can be added to the system at runtime.  It is difficult
      to find a proper static value for cb->min_dump_alloc.
      
      This patch learns the nlattr size required to dump the bpf_sk_storages
      of a sk.  If it happens to be the very first nlmsg of a dump and it
      cannot fit the needed bpf_sk_storages,  it will try to expand the
      skb by "pskb_expand_head()".
      
      Instead of expanding it in inet_sk_diag_fill(), it is expanded at a
      sleepable context in __inet_diag_dump() so __GFP_DIRECT_RECLAIM can
      be used.  In __inet_diag_dump(), it will retry as long as the
      skb is empty and the cb->min_dump_alloc becomes larger than before.
      cb->min_dump_alloc is bounded by KMALLOC_MAX_SIZE.  The min_dump_alloc
      is also changed from 'u16' to 'u32' to accommodate a sk that may have
      a few large bpf_sk_storages.
      
      The updated cb->min_dump_alloc will also be used to allocate the skb in
      the next dump.  This logic already exists in netlink_dump().
      
      Here is the sample output of a locally modified 'ss' and it could be made
      more readable by using BTF later:
      [root@arch-fb-vm1 ~]# ss --bpf-map-id 14 --bpf-map-id 13 -t6an 'dst [::1]:8989'
      State Recv-Q Send-Q Local Address:Port  Peer Address:PortProcess
      ESTAB 0      0              [::1]:51072        [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      ESTAB 0      0              [::1]:51070        [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      
      [root@arch-fb-vm1 ~]# ~/devshare/github/iproute2/misc/ss --bpf-maps -t6an 'dst [::1]:8989'
      State         Recv-Q         Send-Q                   Local Address:Port                    Peer Address:Port         Process
      ESTAB         0              0                                [::1]:51072                          [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
      ESTAB         0              0                                [::1]:51070                          [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com
      085c20ca
    • Martin KaFai Lau's avatar
      bpf: INET_DIAG support in bpf_sk_storage · 1ed4d924
      Martin KaFai Lau authored
      This patch adds INET_DIAG support to bpf_sk_storage.
      
      1. Although this series adds bpf_sk_storage diag capability to inet sk,
         bpf_sk_storage is in general applicable to all fullsock.  Hence, the
         bpf_sk_storage logic will operate on SK_DIAG_* nlattr.  The caller
         will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
         the argument.
      
      2. The request will be like:
      	INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		......
      
         Considering there could have multiple bpf_sk_storages in a sk,
         instead of reusing INET_DIAG_INFO ("ss -i"),  the user can select
         some specific bpf_sk_storage to dump by specifying an array of
         SK_DIAG_BPF_STORAGE_REQ_MAP_FD.
      
         If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
         INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
         of a sk.
      
      3. The reply will be like:
      	INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		......
      
      4. Unlike other INET_DIAG info of a sk which is pretty static, the size
         required to dump the bpf_sk_storage(s) of a sk is dynamic as the
         system adding more bpf_sk_storage_map.  It is hard to set a static
         min_dump_alloc size.
      
         Hence, this series learns it at the runtime and adjust the
         cb->min_dump_alloc as it iterates all sk(s) of a system.  The
         "unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
         is for this purpose.
      
         The next patch will update the cb->min_dump_alloc as it
         iterates the sk(s).
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230421.1975729-1-kafai@fb.com
      1ed4d924
    • Martin KaFai Lau's avatar
      inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data · 0df6d328
      Martin KaFai Lau authored
      The INET_DIAG_REQ_BYTECODE nlattr is currently re-found every time when
      the "dump()" is re-started.
      
      In a latter patch, it will also need to parse the new
      INET_DIAG_REQ_SK_BPF_STORAGES nlattr to learn the map_fds. Thus, this
      patch takes this chance to store the parsed nlattr in cb->data
      during the "start" time of a dump.
      
      By doing this, the "bc" argument also becomes unnecessary
      and is removed.  Also, the two copies of the INET_DIAG_REQ_BYTECODE
      parsing-audit logic between compat/current version can be
      consolidated to one.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230415.1975555-1-kafai@fb.com
      0df6d328
    • Martin KaFai Lau's avatar
      inet_diag: Refactor inet_sk_diag_fill(), dump(), and dump_one() · 5682d393
      Martin KaFai Lau authored
      In a latter patch, there is a need to update "cb->min_dump_alloc"
      in inet_sk_diag_fill() as it learns the diffierent bpf_sk_storages
      stored in a sk while dumping all sk(s) (e.g. tcp_hashinfo).
      
      The inet_sk_diag_fill() currently does not take the "cb" as an argument.
      One of the reason is inet_sk_diag_fill() is used by both dump_one()
      and dump() (which belong to the "struct inet_diag_handler".  The dump_one()
      interface does not pass the "cb" along.
      
      This patch is to make dump_one() pass a "cb".  The "cb" is created in
      inet_diag_cmd_exact().  The "nlh" and "in_skb" are stored in "cb" as
      the dump() interface does.  The total number of args in
      inet_sk_diag_fill() is also cut from 10 to 7 and
      that helps many callers to pass fewer args.
      
      In particular,
      "struct user_namespace *user_ns", "u32 pid", and "u32 seq"
      can be replaced by accessing "cb->nlh" and "cb->skb".
      
      A similar argument reduction is also made to
      inet_twsk_diag_fill() and inet_req_diag_fill().
      
      inet_csk_diag_dump() and inet_csk_diag_fill() are also removed.
      They are mostly equivalent to inet_sk_diag_fill().  Their repeated
      usages are very limited.  Thus, inet_sk_diag_fill() is directly used
      in those occasions.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230409.1975173-1-kafai@fb.com
      5682d393
    • Gustavo A. R. Silva's avatar
      bpf: Replace zero-length array with flexible-array member · d7f10df8
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200227001744.GA3317@embeddedor
      d7f10df8
  2. 26 Feb, 2020 8 commits
  3. 25 Feb, 2020 26 commits