1. 15 Oct, 2018 14 commits
    • John Fastabend's avatar
      bpf: bpftool, add flag to allow non-compat map definitions · c034a177
      John Fastabend authored
      Multiple map definition structures exist and user may have non-zero
      fields in their definition that are not recognized by bpftool and
      libbpf. The normal behavior is to then fail loading the map. Although
      this is a good default behavior users may still want to load the map
      for debugging or other reasons. This patch adds a --mapcompat flag
      that can be used to override the default behavior and allow loading
      the map even when it has additional non-zero fields.
      
      For now the only user is 'bpftool prog' we can switch over other
      subcommands as needed. The library exposes an API that consumes
      a flags field now but I kept the original API around also in case
      users of the API don't want to expose this. The flags field is an
      int in case we need more control over how the API call handles
      errors/features/etc in the future.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c034a177
    • John Fastabend's avatar
      bpf: bpftool, add support for attaching programs to maps · b7d3826c
      John Fastabend authored
      Sock map/hash introduce support for attaching programs to maps. To
      date I have been doing this with custom tooling but this is less than
      ideal as we shift to using bpftool as the single CLI for our BPF uses.
      This patch adds new sub commands 'attach' and 'detach' to the 'prog'
      command to attach programs to maps and then detach them.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b7d3826c
    • Alexei Starovoitov's avatar
      Merge branch 'ipv6_sk_lookup_fixes' · 7d1f12b8
      Alexei Starovoitov authored
      Joe Stringer says:
      
      ====================
      This series includes a couple of fixups for the IPv6 socket lookup
      helper, to make the API more consistent (always supply all arguments in
      network byte-order) and to allow its use when IPv6 is compiled as a
      module.
      ====================
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d1f12b8
    • Joe Stringer's avatar
      bpf: Fix IPv6 dport byte-order in bpf_sk_lookup · 5ef0ae84
      Joe Stringer authored
      Commit 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      mistakenly passed the destination port in network byte-order to the IPv6
      TCP/UDP socket lookup functions, which meant that BPF writers would need
      to either manually swap the byte-order of this field or otherwise IPv6
      sockets could not be located via this helper.
      
      Fix the issue by swapping the byte-order appropriately in the helper.
      This also makes the API more consistent with the IPv4 version.
      
      Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5ef0ae84
    • Joe Stringer's avatar
      bpf: Allow sk_lookup with IPv6 module · 8a615c6b
      Joe Stringer authored
      This is a more complete fix than d71019b5 ("net: core: Fix build
      with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
      module is loaded (not just if it's compiled in).
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8a615c6b
    • Alexei Starovoitov's avatar
      Merge branch 'sockmap_and_ktls' · d04fb13c
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      This work adds a generic sk_msg layer and converts both sockmap
      and later ktls over to make use of it as a common data structure
      for application data (similarly as sk_buff for network packets).
      With that in place the sk_msg framework spans accross ULP layer
      in the kernel and allows for introspection or filtering of L7
      data with the help of BPF programs operating on a common input
      context.
      
      In a second step, we enable the latter for ktls which was previously
      not possible, meaning, ktls and sk_msg verdict programs were
      mutually exclusive in the ULP layer which created challenges for
      the orchestrator when trying to apply TCP based policy, for
      example. Leveraging the prior consolidation we can finally overcome
      this limitation.
      
      Note, there's no change in behavior when ktls is not used in
      combination with BPF, and also no change in behavior for stand
      alone sockmap. The kselftest suites for ktls, sockmap and ktls
      with sockmap combined also runs through successfully. For further
      details please see individual patches.
      
      Thanks!
      
      v1 -> v2:
        - Removed leftover comment spotted by Alexei
        - Improved commit messages, rebase
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d04fb13c
    • Daniel Borkmann's avatar
      bpf, doc: add maintainers entry to related files · eea0d2ad
      Daniel Borkmann authored
      Add a MAINTAINERS entry to the skmsg and related files such that
      patches, features, bug reports land with the right Cc.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eea0d2ad
    • John Fastabend's avatar
      bpf: add tls support for testing in test_sockmap · e9dd9047
      John Fastabend authored
      This adds a --ktls option to test_sockmap in order to enable the
      combination of ktls and sockmap to run, which makes for another
      batch of 648 test cases for both in combination.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e9dd9047
    • John Fastabend's avatar
      tls: add bpf support to sk_msg handling · d3b18ad3
      John Fastabend authored
      This work adds BPF sk_msg verdict program support to kTLS
      allowing BPF and kTLS to be combined together. Previously kTLS
      and sk_msg verdict programs were mutually exclusive in the
      ULP layer which created challenges for the orchestrator when
      trying to apply TCP based policy, for example. To resolve this,
      leveraging the work from previous patches that consolidates
      the use of sk_msg, we can finally enable BPF sk_msg verdict
      programs so they continue to run after the kTLS socket is
      created. No change in behavior when kTLS is not used in
      combination with BPF, the kselftest suite for kTLS also runs
      successfully.
      
      Joint work with Daniel.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d3b18ad3
    • John Fastabend's avatar
      tls: replace poll implementation with read hook · 924ad65e
      John Fastabend authored
      Instead of re-implementing poll routine use the poll callback to
      trigger read from kTLS, we reuse the stream_memory_read callback
      which is simpler and achieves the same. This helps to align sockmap
      and kTLS so we can more easily embed BPF in kTLS.
      
      Joint work with Daniel.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      924ad65e
    • Daniel Borkmann's avatar
      tls: convert to generic sk_msg interface · d829e9c4
      Daniel Borkmann authored
      Convert kTLS over to make use of sk_msg interface for plaintext and
      encrypted scattergather data, so it reuses all the sk_msg helpers
      and data structure which later on in a second step enables to glue
      this to BPF.
      
      This also allows to remove quite a bit of open coded helpers which
      are covered by the sk_msg API. Recent changes in kTLs 80ece6a0
      ("tls: Remove redundant vars from tls record structure") and
      4e6d4720 ("tls: Add support for inplace records encryption")
      changed the data path handling a bit; while we've kept the latter
      optimization intact, we had to undo the former change to better
      fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
      been brought back and are linked into the sk_msg sgs. Now the kTLS
      record contains a msg_plaintext and msg_encrypted sk_msg each.
      
      In the original code, the zerocopy_from_iter() has been used out
      of TX but also RX path. For the strparser skb-based RX path,
      we've left the zerocopy_from_iter() in decrypt_internal() mostly
      untouched, meaning it has been moved into tls_setup_from_iter()
      with charging logic removed (as not used from RX). Given RX path
      is not based on sk_msg objects, we haven't pursued setting up a
      dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
      could be an option to prusue in a later step.
      
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d829e9c4
    • Daniel Borkmann's avatar
      bpf, sockmap: convert to generic sk_msg interface · 604326b4
      Daniel Borkmann authored
      Add a generic sk_msg layer, and convert current sockmap and later
      kTLS over to make use of it. While sk_buff handles network packet
      representation from netdevice up to socket, sk_msg handles data
      representation from application to socket layer.
      
      This means that sk_msg framework spans across ULP users in the
      kernel, and enables features such as introspection or filtering
      of data with the help of BPF programs that operate on this data
      structure.
      
      Latter becomes in particular useful for kTLS where data encryption
      is deferred into the kernel, and as such enabling the kernel to
      perform L7 introspection and policy based on BPF for TLS connections
      where the record is being encrypted after BPF has run and came to
      a verdict. In order to get there, first step is to transform open
      coding of scatter-gather list handling into a common core framework
      that subsystems can use.
      
      The code itself has been split and refactored into three bigger
      pieces: i) the generic sk_msg API which deals with managing the
      scatter gather ring, providing helpers for walking and mangling,
      transferring application data from user space into it, and preparing
      it for BPF pre/post-processing, ii) the plain sock map itself
      where sockets can be attached to or detached from; these bits
      are independent of i) which can now be used also without sock
      map, and iii) the integration with plain TCP as one protocol
      to be used for processing L7 application data (later this could
      e.g. also be extended to other protocols like UDP). The semantics
      are the same with the old sock map code and therefore no change
      of user facing behavior or APIs. While pursuing this work it
      also helped finding a number of bugs in the old sockmap code
      that we've fixed already in earlier commits. The test_sockmap
      kselftest suite passes through fine as well.
      
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      604326b4
    • Daniel Borkmann's avatar
      tcp, ulp: remove ulp bits from sockmap · 1243a51f
      Daniel Borkmann authored
      In order to prepare sockmap logic to be used in combination with kTLS
      we need to detangle it from ULP, and further split it in later commits
      into a generic API.
      
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1243a51f
    • Daniel Borkmann's avatar
      tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanup · 8b9088f8
      Daniel Borkmann authored
      Whenever the ULP data on the socket is mangled, enforce that the
      caller has the socket lock held as otherwise things may race with
      initialization and cleanup callbacks from ulp ops as both would
      mangle internal socket state.
      
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8b9088f8
  2. 14 Oct, 2018 1 commit
  3. 13 Oct, 2018 1 commit
  4. 11 Oct, 2018 4 commits
  5. 10 Oct, 2018 13 commits
  6. 09 Oct, 2018 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 071a234a
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2018-10-08
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
      BPF programs to perform lookups for sockets in a network namespace. This would
      allow programs to determine early on in processing whether the stack is
      expecting to receive the packet, and perform some action (eg drop,
      forward somewhere) based on this information.
      
      2) per-cpu cgroup local storage from Roman Gushchin.
      Per-cpu cgroup local storage is very similar to simple cgroup storage
      except all the data is per-cpu. The main goal of per-cpu variant is to
      implement super fast counters (e.g. packet counters), which don't require
      neither lookups, neither atomic operations in a fast path.
      The example of these hybrid counters is in selftests/bpf/netcnt_prog.c
      
      3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet
      
      4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski
      
      5) rename of libbpf interfaces from Andrey Ignatov.
      libbpf is maturing as a library and should follow good practices in
      library design and implementation to play well with other libraries.
      This patch set brings consistent naming convention to global symbols.
      
      6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
      to let Apache2 projects use libbpf
      
      7) various AF_XDP fixes from Björn and Magnus
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      071a234a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 9000a457
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next tree:
      
      1) Support for matching on ipsec policy already set in the route, from
         Florian Westphal.
      
      2) Split set destruction into deactivate and destroy phase to make it
         fit better into the transaction infrastructure, also from Florian.
         This includes a patch to warn on imbalance when setting the new
         activate and deactivate interfaces.
      
      3) Release transaction list from the workqueue to remove expensive
         synchronize_rcu() from configuration plane path. This speeds up
         configuration plane quite a bit. From Florian Westphal.
      
      4) Add new xfrm/ipsec extension, this new extension allows you to match
         for ipsec tunnel keys such as source and destination address, spi and
         reqid. From Máté Eckl and Florian Westphal.
      
      5) Add secmark support, this includes connsecmark too, patches
         from Christian Gottsche.
      
      6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng.
         One follow up patch to calm a clang warning for this one, from
         Nathan Chancellor.
      
      7) Flush conntrack entries based on layer 3 family, from Kristian Evensen.
      
      8) New revision for cgroups2 to shrink the path field.
      
      9) Get rid of obsolete need_conntrack(), as a result from recent
         demodularization works.
      
      10) Use WARN_ON instead of BUG_ON, from Florian Westphal.
      
      11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian.
      
      12) Remove superfluous check for timeout netlink parser and dump
          functions in layer 4 conntrack helpers.
      
      13) Unnecessary redundant rcu read side locks in NAT redirect,
          from Taehee Yoo.
      
      14) Pass nf_hook_state structure to error handlers, patch from
          Florian Westphal.
      
      15) Remove ->new() interface from layer 4 protocol trackers. Place
          them in the ->packet() interface. From Florian.
      
      16) Place conntrack ->error() handling in the ->packet() interface.
          Patches from Florian Westphal.
      
      17) Remove unused parameter in the pernet initialization path,
          also from Florian.
      
      18) Remove additional parameter to specify layer 3 protocol when
          looking up for protocol tracker. From Florian.
      
      19) Shrink array of layer 4 protocol trackers, from Florian.
      
      20) Check for linear skb only once from the ALG NAT mangling
          codebase, from Taehee Yoo.
      
      21) Use rhashtable_walk_enter() instead of deprecated
          rhashtable_walk_init(), also from Taehee.
      
      22) No need to flush all conntracks when only one single address
          is gone, from Tan Hu.
      
      23) Remove redundant check for NAT flags in flowtable code, from
          Taehee Yoo.
      
      24) Use rhashtable_lookup() instead of rhashtable_lookup_fast()
          from netfilter codebase, since rcu read lock side is already
          assumed in this path.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9000a457
  7. 08 Oct, 2018 5 commits
    • Arnd Bergmann's avatar
      bpf: fix building without CONFIG_INET · df3f94a0
      Arnd Bergmann authored
      The newly added TCP and UDP handling fails to link when CONFIG_INET
      is disabled:
      
      net/core/filter.o: In function `sk_lookup':
      filter.c:(.text+0x7ff8): undefined reference to `tcp_hashinfo'
      filter.c:(.text+0x7ffc): undefined reference to `tcp_hashinfo'
      filter.c:(.text+0x8020): undefined reference to `__inet_lookup_established'
      filter.c:(.text+0x8058): undefined reference to `__inet_lookup_listener'
      filter.c:(.text+0x8068): undefined reference to `udp_table'
      filter.c:(.text+0x8070): undefined reference to `udp_table'
      filter.c:(.text+0x808c): undefined reference to `__udp4_lib_lookup'
      net/core/filter.o: In function `bpf_sk_release':
      filter.c:(.text+0x82e8): undefined reference to `sock_gen_put'
      
      Wrap the related sections of code in #ifdefs for the config option.
      
      Furthermore, sk_lookup() should always have been marked 'static', this
      also avoids a warning about a missing prototype when building with
      'make W=1'.
      
      Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      df3f94a0
    • Nathan Chancellor's avatar
      netfilter: xt_quota: Don't use aligned attribute in sizeof · ffa0a9a5
      Nathan Chancellor authored
      Clang warns:
      
      net/netfilter/xt_quota.c:47:44: warning: 'aligned' attribute ignored
      when parsing type [-Wignored-attributes]
              BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__aligned_u64));
                                                        ^~~~~~~~~~~~~
      
      Use 'sizeof(__u64)' instead, as the alignment doesn't affect the size
      of the type.
      
      Fixes: e9837e55 ("netfilter: xt_quota: fix the behavior of xt_quota module")
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ffa0a9a5
    • Ioana Ciocoi Radulescu's avatar
      dpaa2-eth: Don't account Tx confirmation frames on NAPI poll · 68049a5f
      Ioana Ciocoi Radulescu authored
      Until now, both Rx and Tx confirmation frames handled during
      NAPI poll were counted toward the NAPI budget. However, Tx
      confirmations are lighter to process than Rx frames, which can
      skew the amount of work actually done inside one NAPI cycle.
      
      Update the code to only count Rx frames toward the NAPI budget
      and set a separate threshold on how many Tx conf frames can be
      processed in one poll cycle.
      
      The NAPI poll routine stops when either the budget is consumed
      by Rx frames or when Tx confirmation frames reach this threshold.
      Signed-off-by: default avatarIoana Radulescu <ruxandra.radulescu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68049a5f
    • YueHaibing's avatar
      net: mscc: ocelot: remove set but not used variable 'phy_mode' · 9e19dabc
      YueHaibing authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      drivers/net/ethernet/mscc/ocelot_board.c: In function 'mscc_ocelot_probe':
      drivers/net/ethernet/mscc/ocelot_board.c:262:17: warning:
       variable 'phy_mode' set but not used [-Wunused-but-set-variable]
         enum phy_mode phy_mode;
      
      It never used since introduction in
      commit 71e32a20 ("net: mscc: ocelot: make use of SerDes PHYs for handling their configuration")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e19dabc
    • David S. Miller's avatar
      Merge branch 'more-pmtu-selftests' · ee9615be
      David S. Miller authored
      Sabrina Dubroca says:
      
      ====================
      selftests: add more PMTU tests
      
      The current selftests for PMTU cover VTI tunnels, but there's nothing
      about the generation and handling of PMTU exceptions by intermediate
      routers. This series adds and improves existing helpers, then adds
      IPv4 and IPv6 selftests with a setup involving an intermediate router.
      
      Joint work with Stefano Brivio.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee9615be