1. 15 Apr, 2023 2 commits
    • Jakub Kicinski's avatar
      page_pool: allow caching from safely localized NAPI · 8c48eea3
      Jakub Kicinski authored
      Recent patches to mlx5 mentioned a regression when moving from
      driver local page pool to only using the generic page pool code.
      Page pool has two recycling paths (1) direct one, which runs in
      safe NAPI context (basically consumer context, so producing
      can be lockless); and (2) via a ptr_ring, which takes a spin
      lock because the freeing can happen from any CPU; producer
      and consumer may run concurrently.
      
      Since the page pool code was added, Eric introduced a revised version
      of deferred skb freeing. TCP skbs are now usually returned to the CPU
      which allocated them, and freed in softirq context. This places the
      freeing (producing of pages back to the pool) enticingly close to
      the allocation (consumer).
      
      If we can prove that we're freeing in the same softirq context in which
      the consumer NAPI will run - lockless use of the cache is perfectly fine,
      no need for the lock.
      
      Let drivers link the page pool to a NAPI instance. If the NAPI instance
      is scheduled on the same CPU on which we're freeing - place the pages
      in the direct cache.
      
      With that and patched bnxt (XDP enabled to engage the page pool, sigh,
      bnxt really needs page pool work :() I see a 2.6% perf boost with
      a TCP stream test (app on a different physical core than softirq).
      
      The CPU use of relevant functions decreases as expected:
      
        page_pool_refill_alloc_cache   1.17% -> 0%
        _raw_spin_lock                 2.41% -> 0.98%
      
      Only consider lockless path to be safe when NAPI is scheduled
      - in practice this should cover majority if not all of steady state
      workloads. It's usually the NAPI kicking in that causes the skb flush.
      
      The main case we'll miss out on is when application runs on the same
      CPU as NAPI. In that case we don't use the deferred skb free path.
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c48eea3
    • Jakub Kicinski's avatar
      net: skb: plumb napi state thru skb freeing paths · b07a2d97
      Jakub Kicinski authored
      We maintain a NAPI-local cache of skbs which is fed by napi_consume_skb().
      Going forward we will also try to cache head and data pages.
      Plumb the "are we in a normal NAPI context" information thru
      deeper into the freeing path, up to skb_release_data() and
      skb_free_head()/skb_pp_recycle(). The "not normal NAPI context"
      comes from netpoll which passes budget of 0 to try to reap
      the Tx completions but not perform any Rx.
      
      Use "bool napi_safe" rather than bare "int budget",
      the further we get from NAPI the more confusing the budget
      argument may seem (particularly whether 0 or MAX is the
      correct value to pass in when not in NAPI).
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Tested-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b07a2d97
  2. 14 Apr, 2023 36 commits
  3. 13 Apr, 2023 2 commits
    • Jakub Kicinski's avatar
      Daniel Borkmann says: · c2865b11
      Jakub Kicinski authored
      ====================
      pull-request: bpf-next 2023-04-13
      
      We've added 260 non-merge commits during the last 36 day(s) which contain
      a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).
      
      The main changes are:
      
      1) Rework BPF verifier log behavior and implement it as a rotating log
         by default with the option to retain old-style fixed log behavior,
         from Andrii Nakryiko.
      
      2) Adds support for using {FOU,GUE} encap with an ipip device operating
         in collect_md mode and add a set of BPF kfuncs for controlling encap
         params, from Christian Ehrig.
      
      3) Allow BPF programs to detect at load time whether a particular kfunc
         exists or not, and also add support for this in light skeleton,
         from Alexei Starovoitov.
      
      4) Optimize hashmap lookups when key size is multiple of 4,
         from Anton Protopopov.
      
      5) Enable RCU semantics for task BPF kptrs and allow referenced kptr
         tasks to be stored in BPF maps, from David Vernet.
      
      6) Add support for stashing local BPF kptr into a map value via
         bpf_kptr_xchg(). This is useful e.g. for rbtree node creation
         for new cgroups, from Dave Marchevsky.
      
      7) Fix BTF handling of is_int_ptr to skip modifiers to work around
         tracing issues where a program cannot be attached, from Feng Zhou.
      
      8) Migrate a big portion of test_verifier unit tests over to
         test_progs -a verifier_* via inline asm to ease {read,debug}ability,
         from Eduard Zingerman.
      
      9) Several updates to the instruction-set.rst documentation
         which is subject to future IETF standardization
         (https://lwn.net/Articles/926882/), from Dave Thaler.
      
      10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register
          known bits information propagation, from Daniel Borkmann.
      
      11) Add skb bitfield compaction work related to BPF with the overall goal
          to make more of the sk_buff bits optional, from Jakub Kicinski.
      
      12) BPF selftest cleanups for build id extraction which stand on its own
          from the upcoming integration work of build id into struct file object,
          from Jiri Olsa.
      
      13) Add fixes and optimizations for xsk descriptor validation and several
          selftest improvements for xsk sockets, from Kal Conley.
      
      14) Add BPF links for struct_ops and enable switching implementations
          of BPF TCP cong-ctls under a given name by replacing backing
          struct_ops map, from Kui-Feng Lee.
      
      15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable
          offset stack read as earlier Spectre checks cover this,
          from Luis Gerhorst.
      
      16) Fix issues in copy_from_user_nofault() for BPF and other tracers
          to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner
          and Alexei Starovoitov.
      
      17) Add --json-summary option to test_progs in order for CI tooling to
          ease parsing of test results, from Manu Bretelle.
      
      18) Batch of improvements and refactoring to prep for upcoming
          bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator,
          from Martin KaFai Lau.
      
      19) Improve bpftool's visual program dump which produces the control
          flow graph in a DOT format by adding C source inline annotations,
          from Quentin Monnet.
      
      20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting
          the module name from BTF of the target and searching kallsyms of
          the correct module, from Viktor Malik.
      
      21) Improve BPF verifier handling of '<const> <cond> <non_const>'
          to better detect whether in particular jmp32 branches are taken,
          from Yonghong Song.
      
      22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock.
          A built-in cc or one from a kernel module is already able to write
          to app_limited, from Yixin Shen.
      
      Conflicts:
      
      Documentation/bpf/bpf_devel_QA.rst
        b7abcd9c ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
        0f10f647 ("bpf, docs: Use internal linking for link to netdev subsystem doc")
      https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/
      
      include/net/ip_tunnels.h
        bc9d003d ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts")
        ac931d4c ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices")
      https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/
      
      net/bpf/test_run.c
        e5995bc7 ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
        294635a8 ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES")
      https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/
      ====================
      
      Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c2865b11
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 800e68c4
      Jakub Kicinski authored
      Conflicts:
      
      tools/testing/selftests/net/config
        62199e3f ("selftests: net: Add VXLAN MDB test")
        3a0385be ("selftests: add the missing CONFIG_IP_SCTP in net config")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      800e68c4