1. 19 Apr, 2018 10 commits
    • Martin KaFai Lau's avatar
      bpf: btf: Add BTF support to libbpf · 8a138aed
      Martin KaFai Lau authored
      If the ".BTF" elf section exists, libbpf will try to create
      a btf_fd (through BPF_BTF_LOAD).  If that fails, it will still
      continue loading the bpf prog/map without the BTF.
      
      If the bpf_object has a BTF loaded, it will create a map with the btf_fd.
      libbpf will try to figure out the btf_key_id and btf_value_id of a map by
      finding the BTF type with name "<map_name>_key" and "<map_name>_value".
      If they cannot be found, it will continue without using the BTF.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8a138aed
    • Martin KaFai Lau's avatar
      bpf: btf: Sync bpf.h and btf.h to tools/ · 3bd86a84
      Martin KaFai Lau authored
      This patch sync up the bpf.h and btf.h to tools/
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3bd86a84
    • Martin KaFai Lau's avatar
      bpf: btf: Add pretty print support to the basic arraymap · a26ca7c9
      Martin KaFai Lau authored
      This patch adds pretty print support to the basic arraymap.
      Support for other bpf maps can be added later.
      
      This patch adds new attrs to the BPF_MAP_CREATE command to allow
      specifying the btf_fd, btf_key_id and btf_value_id.  The
      BPF_MAP_CREATE can then associate the btf to the map if
      the creating map supports BTF.
      
      A BTF supported map needs to implement two new map ops,
      map_seq_show_elem() and map_check_btf().  This patch has
      implemented these new map ops for the basic arraymap.
      
      It also adds file_operations, bpffs_map_fops, to the pinned
      map such that the pinned map can be opened and read.
      After that, the user has an intuitive way to do
      "cat bpffs/pathto/a-pinned-map" instead of getting
      an error.
      
      bpffs_map_fops should not be extended further to support
      other operations.  Other operations (e.g. write/key-lookup...)
      should be realized by the userspace tools (e.g. bpftool) through
      the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
      Follow up patches will allow the userspace to obtain
      the BTF from a map-fd.
      
      Here is a sample output when reading a pinned arraymap
      with the following map's value:
      
      struct map_value {
      	int count_a;
      	int count_b;
      };
      
      cat /sys/fs/bpf/pinned_array_map:
      
      0: {1,2}
      1: {3,4}
      2: {5,6}
      ...
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a26ca7c9
    • Martin KaFai Lau's avatar
      bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd · 60197cfb
      Martin KaFai Lau authored
      This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
      The original BTF data, which was used to create the BTF fd during
      the earlier BPF_BTF_LOAD call, will be returned.
      
      The userspace is expected to allocate buffer
      to info.info and the buffer size is set to info.info_len before
      calling BPF_OBJ_GET_INFO_BY_FD.
      
      The original BTF data is copied to the userspace buffer (info.info).
      Only upto the user's specified info.info_len will be copied.
      
      The original BTF data size is set to info.info_len.  The userspace
      needs to check if it is bigger than its allocated buffer size.
      If it is, the userspace should realloc with the kernel-returned
      info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      60197cfb
    • Martin KaFai Lau's avatar
      bpf: btf: Add BPF_BTF_LOAD command · f56a653c
      Martin KaFai Lau authored
      This patch adds a BPF_BTF_LOAD command which
      1) loads and verifies the BTF (implemented in earlier patches)
      2) returns a BTF fd to userspace.  In the next patch, the
         BTF fd can be specified during BPF_MAP_CREATE.
      
      It currently limits to CAP_SYS_ADMIN.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f56a653c
    • Martin KaFai Lau's avatar
      bpf: btf: Add pretty print capability for data with BTF type info · b00b8dae
      Martin KaFai Lau authored
      This patch adds pretty print capability for data with BTF type info.
      The current usage is to allow pretty print for a BPF map.
      
      The next few patches will allow a read() on a pinned map with BTF
      type info for its key and value.
      
      This patch uses the seq_printf() infra.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b00b8dae
    • Martin KaFai Lau's avatar
      bpf: btf: Check members of struct/union · 179cde8c
      Martin KaFai Lau authored
      This patch checks a few things of struct's members:
      
      1) It has a valid size (e.g. a "const void" is invalid)
      2) A member's size (+ its member's offset) does not exceed
         the containing struct's size.
      3) The member's offset satisfies the alignment requirement
      
      The above can only be done after the needs_resolve member's type
      is resolved.  Hence, the above is done together in
      btf_struct_resolve().
      
      Each possible member's type (e.g. int, enum, modifier...) implements
      the check_member() ops which will be called from btf_struct_resolve().
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      179cde8c
    • Martin KaFai Lau's avatar
      bpf: btf: Validate type reference · eb3f595d
      Martin KaFai Lau authored
      After collecting all btf_type in the first pass in an earlier patch,
      the second pass (in this patch) can validate the reference types
      (e.g. the referring type does exist and it does not refer to itself).
      
      While checking the reference type, it also gathers other information (e.g.
      the size of an array).  This info will be useful in checking the
      struct's members in a later patch.  They will also be useful in doing
      pretty print later.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      eb3f595d
    • Martin KaFai Lau's avatar
      bpf: btf: Introduce BPF Type Format (BTF) · 69b693f0
      Martin KaFai Lau authored
      This patch introduces BPF type Format (BTF).
      
      BTF (BPF Type Format) is the meta data format which describes
      the data types of BPF program/map.  Hence, it basically focus
      on the C programming language which the modern BPF is primary
      using.  The first use case is to provide a generic pretty print
      capability for a BPF map.
      
      BTF has its root from CTF (Compact C-Type format).  To simplify
      the handling of BTF data, BTF removes the differences between
      small and big type/struct-member.  Hence, BTF consistently uses u32
      instead of supporting both "one u16" and "two u32 (+padding)" in
      describing type and struct-member.
      
      It also raises the number of types (and functions) limit
      from 0x7fff to 0x7fffffff.
      
      Due to the above changes,  the format is not compatible to CTF.
      Hence, BTF starts with a new BTF_MAGIC and version number.
      
      This patch does the first verification pass to the BTF.  The first
      pass checks:
      1. meta-data size (e.g. It does not go beyond the total btf's size)
      2. name_offset is valid
      3. Each BTF_KIND (e.g. int, enum, struct....) does its
         own check of its meta-data.
      
      Some other checks, like checking a struct's member is referring
      to a valid type, can only be done in the second pass.  The second
      verification pass will be implemented in the next patch.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      69b693f0
    • Jesper Dangaard Brouer's avatar
      bpf: reserve xdp_frame size in xdp headroom · 97e19cce
      Jesper Dangaard Brouer authored
      Commit 6dfb970d ("xdp: avoid leaking info stored in frame data on
      page reuse") tried to allow user/bpf_prog to (re)use area used by
      xdp_frame (stored in frame headroom), by memset clearing area when
      bpf_xdp_adjust_head give bpf_prog access to headroom area.
      
      The mentioned commit had two bugs. (1) Didn't take bpf_xdp_adjust_meta
      into account. (2) a combination of bpf_xdp_adjust_head calls, where
      xdp->data is moved into xdp_frame section, can cause clearing
      xdp_frame area again for area previously granted to bpf_prog.
      
      After discussions with Daniel, we choose to implement a simpler
      solution to the problem, which is to reserve the headroom used by
      xdp_frame info.
      
      This also avoids the situation where bpf_prog is allowed to adjust/add
      headers, and then XDP_REDIRECT later drops the packet due to lack of
      headroom for the xdp_frame.  This would likely confuse the end-user.
      
      Fixes: 6dfb970d ("xdp: avoid leaking info stored in frame data on page reuse")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      97e19cce
  2. 18 Apr, 2018 19 commits
  3. 17 Apr, 2018 11 commits
    • Lorenzo Bianconi's avatar
      ipv6: send netlink notifications for manually configured addresses · a2d481b3
      Lorenzo Bianconi authored
      Send a netlink notification when userspace adds a manually configured
      address if DAD is enabled and optimistic flag isn't set.
      Moreover send RTM_DELADDR notifications for tentative addresses.
      
      Some userspace applications (e.g. NetworkManager) are interested in
      addr netlink events albeit the address is still in tentative state,
      however events are not sent if DAD process is not completed.
      If the address is added and immediately removed userspace listeners
      are not notified. This behaviour can be easily reproduced by using
      veth interfaces:
      
      $ ip -b - <<EOF
      > link add dev vm1 type veth peer name vm2
      > link set dev vm1 up
      > link set dev vm2 up
      > addr add 2001:db8:a:b:1:2:3:4/64 dev vm1
      > addr del 2001:db8:a:b:1:2:3:4/64 dev vm1
      EOF
      
      This patch reverts the behaviour introduced by the commit f784ad3d
      ("ipv6: do not send RTM_DELADDR for tentative addresses")
      Suggested-by: default avatarThomas Haller <thaller@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2d481b3
    • Ganesh Goudar's avatar
      cxgb4vf: display pause settings · a64dcddc
      Ganesh Goudar authored
      Add support to display pause settings
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a64dcddc
    • Hangbin Liu's avatar
      vxlan: add ttl inherit support · 72f6d71e
      Hangbin Liu authored
      Like tos inherit, ttl inherit should also means inherit the inner protocol's
      ttl values, which actually not implemented in vxlan yet.
      
      But we could not treat ttl == 0 as "use the inner TTL", because that would be
      used also when the "ttl" option is not specified and that would be a behavior
      change, and breaking real use cases.
      
      So add a different attribute IFLA_VXLAN_TTL_INHERIT when "ttl inherit" is
      specified with ip cmd.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Suggested-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72f6d71e
    • Samuel Mendoza-Jonas's avatar
      net/ncsi: Refactor MAC, VLAN filters · 062b3e1b
      Samuel Mendoza-Jonas authored
      The NCSI driver defines a generic ncsi_channel_filter struct that can be
      used to store arbitrarily formatted filters, and several generic methods
      of accessing data stored in such a filter.
      However in both the driver and as defined in the NCSI specification
      there are only two actual filters: VLAN ID filters and MAC address
      filters. The splitting of the MAC filter into unicast, multicast, and
      mixed is also technically not necessary as these are stored in the same
      location in hardware.
      
      To save complexity, particularly in the set up and accessing of these
      generic filters, remove them in favour of two specific structs. These
      can be acted on directly and do not need several generic helper
      functions to use.
      
      This also fixes a memory error found by KASAN on ARM32 (which is not
      upstream yet), where response handlers accessing a filter's data field
      could write past allocated memory.
      
      [  114.926512] ==================================================================
      [  114.933861] BUG: KASAN: slab-out-of-bounds in ncsi_configure_channel+0x4b8/0xc58
      [  114.941304] Read of size 2 at addr 94888558 by task kworker/0:2/546
      [  114.947593]
      [  114.949146] CPU: 0 PID: 546 Comm: kworker/0:2 Not tainted 4.16.0-rc6-00119-ge156398bfcad #13
      ...
      [  115.170233] The buggy address belongs to the object at 94888540
      [  115.170233]  which belongs to the cache kmalloc-32 of size 32
      [  115.181917] The buggy address is located 24 bytes inside of
      [  115.181917]  32-byte region [94888540, 94888560)
      [  115.192115] The buggy address belongs to the page:
      [  115.196943] page:9eeac100 count:1 mapcount:0 mapping:94888000 index:0x94888fc1
      [  115.204200] flags: 0x100(slab)
      [  115.207330] raw: 00000100 94888000 94888fc1 0000003f 00000001 9eea2014 9eecaa74 96c003e0
      [  115.215444] page dumped because: kasan: bad access detected
      [  115.221036]
      [  115.222544] Memory state around the buggy address:
      [  115.227384]  94888400: fb fb fb fb fc fc fc fc 04 fc fc fc fc fc fc fc
      [  115.233959]  94888480: 00 00 00 fc fc fc fc fc 00 04 fc fc fc fc fc fc
      [  115.240529] >94888500: 00 00 04 fc fc fc fc fc 00 00 04 fc fc fc fc fc
      [  115.247077]                                             ^
      [  115.252523]  94888580: 00 04 fc fc fc fc fc fc 06 fc fc fc fc fc fc fc
      [  115.259093]  94888600: 00 00 06 fc fc fc fc fc 00 00 04 fc fc fc fc fc
      [  115.265639] ==================================================================
      Reported-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      062b3e1b
    • Eric Biggers's avatar
      KEYS: DNS: limit the length of option strings · c210f7b4
      Eric Biggers authored
      Adding a dns_resolver key whose payload contains a very long option name
      resulted in that string being printed in full.  This hit the WARN_ONCE()
      in set_precision() during the printk(), because printk() only supports a
      precision of up to 32767 bytes:
      
          precision 1000000 too large
          WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0
      
      Fix it by limiting option strings (combined name + value) to a much more
      reasonable 128 bytes.  The exact limit is arbitrary, but currently the
      only recognized option is formatted as "dnserror=%lu" which fits well
      within this limit.
      
      Also ratelimit the printks.
      
      Reproducer:
      
          perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s
      
      This bug was found using syzkaller.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: 4a2d7892 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c210f7b4
    • Davide Caratti's avatar
    • Stephen Suryaputra's avatar
      ipv6: Count interface receive statistics on the ingress netdev · bdb7cc64
      Stephen Suryaputra authored
      The statistics such as InHdrErrors should be counted on the ingress
      netdev rather than on the dev from the dst, which is the egress.
      Signed-off-by: default avatarStephen Suryaputra <ssuryaextr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdb7cc64
    • David Ahern's avatar
      net/ipv6: Make __inet6_bind static · 032234d8
      David Ahern authored
      BPF core gets access to __inet6_bind via ipv6_bpf_stub_impl, so it is
      not invoked directly outside of af_inet6.c. Make it static and move
      inet6_bind after to avoid forward declaration.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      032234d8
    • David S. Miller's avatar
      Merge branch 'XDP-redirect-memory-return-API' · 684009d4
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      XDP redirect memory return API
      
      Submitted against net-next, as it contains NIC driver changes.
      
      This patchset works towards supporting different XDP RX-ring memory
      allocators.  As this will be needed by the AF_XDP zero-copy mode.
      
      The patchset uses mlx5 as the sample driver, which gets implemented
      XDP_REDIRECT RX-mode, but not ndo_xdp_xmit (as this API is subject to
      change thought the patchset).
      
      A new struct xdp_frame is introduced (modeled after cpumap xdp_pkt).
      And both ndo_xdp_xmit and the new xdp_return_frame end-up using this.
      
      Support for a driver supplied allocator is implemented, and a
      refurbished version of page_pool is the first return allocator type
      introduced.  This will be a integration point for AF_XDP zero-copy.
      
      The mlx5 driver evolve into using the page_pool, and see a performance
      increase (with ndo_xdp_xmit out ixgbe driver) from 6Mpps to 12Mpps.
      
      The patchset stop at 16 patches (one over limit), but more API changes
      are planned.  Specifically extending ndo_xdp_xmit and xdp_return_frame
      APIs to support bulking.  As this will address some known limits.
      
      V2: Updated according to Tariq's feedback
      V3: Updated based on feedback from Jason Wang and Alex Duyck
      V4: Updated based on feedback from Tariq and Jason
      V5: Fix SPDX license, add Tariq's reviews, improve patch desc for perf test
      V6: Updated based on feedback from Eric Dumazet and Alex Duyck
      V7: Adapt to i40e that got XDP_REDIRECT support in-between
      V8:
       Updated based on feedback kbuild test robot, and adjust for mlx5 changes
       page_pool only compiled into kernel when drivers Kconfig 'select' feature
      V9:
       Remove some inline statements, let compiler decide what to inline
       Fix return value in virtio_net driver
       Adjust for mlx5 changes in-between submissions
      V10:
       Minor adjust for mlx5 requested by Tariq
       Resubmit against net-next
      V11: avoid leaking info stored in frame data on page reuse
      ====================
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      684009d4
    • Jesper Dangaard Brouer's avatar
      xdp: avoid leaking info stored in frame data on page reuse · 6dfb970d
      Jesper Dangaard Brouer authored
      The bpf infrastructure and verifier goes to great length to avoid
      bpf progs leaking kernel (pointer) info.
      
      For queueing an xdp_buff via XDP_REDIRECT, xdp_frame info stores
      kernel info (incl pointers) in top part of frame data (xdp->data_hard_start).
      Checks are in place to assure enough headroom is available for this.
      
      This info is not cleared, and if the frame is reused, then a
      malicious user could use bpf_xdp_adjust_head helper to move
      xdp->data into this area.  Thus, making this area readable.
      
      This is not super critical as XDP progs requires root or
      CAP_SYS_ADMIN, which are privileged enough for such info.  An
      effort (is underway) towards moving networking bpf hooks to the
      lesser privileged mode CAP_NET_ADMIN, where leaking such info
      should be avoided.  Thus, this patch to clear the info when
      needed.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dfb970d
    • Jesper Dangaard Brouer's avatar
      xdp: transition into using xdp_frame for ndo_xdp_xmit · 44fa2dbd
      Jesper Dangaard Brouer authored
      Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct
      xdp_buff.  This brings xdp_return_frame and ndp_xdp_xmit in sync.
      
      This builds towards changing the API further to become a bulk API,
      because xdp_buff is not a queue-able object while xdp_frame is.
      
      V4: Adjust for commit 59655a5b ("tuntap: XDP_TX can use native XDP")
      V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44fa2dbd