1. 22 Mar, 2024 3 commits
  2. 21 Mar, 2024 2 commits
    • Harishankar Vishwanathan's avatar
      bpf-next: Avoid goto in regs_refine_cond_op() · 4c2a26fc
      Harishankar Vishwanathan authored
      In case of GE/GT/SGE/JST instructions, regs_refine_cond_op()
      reuses the logic that does analysis of LE/LT/SLE/SLT instructions.
      This commit avoids the use of a goto to perform the reuse.
      Signed-off-by: default avatarHarishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240321002955.808604-1-harishankar.vishwanathan@gmail.com
      4c2a26fc
    • Quentin Monnet's avatar
      bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool · cc9b22df
      Quentin Monnet authored
      Bpftool's Makefile uses $(HOST_CFLAGS) to build the bootstrap version of
      bpftool, in order to pick the flags for the host (where we run the
      bootstrap version) and not for the target system (where we plan to run
      the full bpftool binary). But we pass too much information through this
      variable.
      
      In particular, we set HOST_CFLAGS by copying most of the $(CFLAGS); but
      we do this after the feature detection for bpftool, which means that
      $(CFLAGS), hence $(HOST_CFLAGS), contain all macro definitions for using
      the different optional features. For example, -DHAVE_LLVM_SUPPORT may be
      passed to the $(HOST_CFLAGS), even though the LLVM disassembler is not
      used in the bootstrap version, and the related library may even be
      missing for the host architecture.
      
      A similar thing happens with the $(LDFLAGS), that we use unchanged for
      linking the bootstrap version even though they may contains flags to
      link against additional libraries.
      
      To address the $(HOST_CFLAGS) issue, we move the definition of
      $(HOST_CFLAGS) earlier in the Makefile, before the $(CFLAGS) update
      resulting from the feature probing - none of which being relevant to the
      bootstrap version. To clean up the $(LDFLAGS) for the bootstrap version,
      we introduce a dedicated $(HOST_LDFLAGS) variable that we base on
      $(LDFLAGS), before the feature probing as well.
      
      On my setup, the following macro and libraries are removed from the
      compiler invocation to build bpftool after this patch:
      
        -DUSE_LIBCAP
        -DHAVE_LLVM_SUPPORT
        -I/usr/lib/llvm-17/include
        -D_GNU_SOURCE
        -D__STDC_CONSTANT_MACROS
        -D__STDC_FORMAT_MACROS
        -D__STDC_LIMIT_MACROS
        -lLLVM-17
        -L/usr/lib/llvm-17/lib
      
      Another advantage of cleaning up these flags is that displaying
      available features with "bpftool version" becomes more accurate for the
      bootstrap bpftool, and no longer reflects the features detected (and
      available only) for the final binary.
      
      Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
      Signed-off-by: default avatarQuentin Monnet <qmo@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Message-ID: <20240320014103.45641-1-qmo@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cc9b22df
  3. 20 Mar, 2024 9 commits
    • Andrii Nakryiko's avatar
      selftests/bpf: scale benchmark counting by using per-CPU counters · 520fad2e
      Andrii Nakryiko authored
      When benchmarking with multiple threads (-pN, where N>1), we start
      contending on single atomic counter that both BPF trigger benchmarks are
      using, as well as "baseline" tests in user space (trig-base and
      trig-uprobe-base benchmarks). As such, we start bottlenecking on
      something completely irrelevant to benchmark at hand.
      
      Scale counting up by using per-CPU counters on BPF side. On use space
      side we do the next best thing: hash thread ID to approximate per-CPU
      behavior. It seems to work quite well in practice.
      
      To demonstrate the difference, I ran three benchmarks with 1, 2, 4, 8,
      16, and 32 threads:
        - trig-uprobe-base (no syscalls, pure tight counting loop in user-space);
        - trig-base (get_pgid() syscall, atomic counter in user-space);
        - trig-fentry (syscall to trigger fentry program, atomic uncontended per-CPU
          counter on BPF side).
      
      Command used:
      
        for b in uprobe-base base fentry; do \
          for p in 1 2 4 8 16 32; do \
            printf "%-11s %2d: %s\n" $b $p \
              "$(sudo ./bench -w2 -d5 -a -p$p trig-$b | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)"; \
          done; \
        done
      
      Before these changes, aggregate throughput across all threads doesn't
      scale well with number of threads, it actually even falls sharply for
      uprobe-base due to a very high contention:
      
        uprobe-base  1:  138.998 ± 0.650M/s
        uprobe-base  2:   70.526 ± 1.147M/s
        uprobe-base  4:   63.114 ± 0.302M/s
        uprobe-base  8:   54.177 ± 0.138M/s
        uprobe-base 16:   45.439 ± 0.057M/s
        uprobe-base 32:   37.163 ± 0.242M/s
        base         1:   16.940 ± 0.182M/s
        base         2:   19.231 ± 0.105M/s
        base         4:   21.479 ± 0.038M/s
        base         8:   23.030 ± 0.037M/s
        base        16:   22.034 ± 0.004M/s
        base        32:   18.152 ± 0.013M/s
        fentry       1:   14.794 ± 0.054M/s
        fentry       2:   17.341 ± 0.055M/s
        fentry       4:   23.792 ± 0.024M/s
        fentry       8:   21.557 ± 0.047M/s
        fentry      16:   21.121 ± 0.004M/s
        fentry      32:   17.067 ± 0.023M/s
      
      After these changes, we see almost perfect linear scaling, as expected.
      The sub-linear scaling when going from 8 to 16 threads is interesting
      and consistent on my test machine, but I haven't investigated what is
      causing it this peculiar slowdown (across all benchmarks, could be due
      to hyperthreading effects, not sure).
      
        uprobe-base  1:  139.980 ± 0.648M/s
        uprobe-base  2:  270.244 ± 0.379M/s
        uprobe-base  4:  532.044 ± 1.519M/s
        uprobe-base  8: 1004.571 ± 3.174M/s
        uprobe-base 16: 1720.098 ± 0.744M/s
        uprobe-base 32: 3506.659 ± 8.549M/s
        base         1:   16.869 ± 0.071M/s
        base         2:   33.007 ± 0.092M/s
        base         4:   64.670 ± 0.203M/s
        base         8:  121.969 ± 0.210M/s
        base        16:  207.832 ± 0.112M/s
        base        32:  424.227 ± 1.477M/s
        fentry       1:   14.777 ± 0.087M/s
        fentry       2:   28.575 ± 0.146M/s
        fentry       4:   56.234 ± 0.176M/s
        fentry       8:  106.095 ± 0.385M/s
        fentry      16:  181.440 ± 0.032M/s
        fentry      32:  369.131 ± 0.693M/s
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Message-ID: <20240315213329.1161589-1-andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      520fad2e
    • Quentin Monnet's avatar
      bpftool: Remove unnecessary source files from bootstrap version · e9a826dd
      Quentin Monnet authored
      Commit d510296d ("bpftool: Use syscall/loader program in "prog load"
      and "gen skeleton" command.") added new files to the list of objects to
      compile in order to build the bootstrap version of bpftool. As far as I
      can tell, these objects are unnecessary and were added by mistake; maybe
      a draft version intended to add support for loading loader programs from
      the bootstrap version. Anyway, we can remove these object files from the
      list to make the bootstrap bpftool binary a tad smaller and faster to
      build.
      
      Fixes: d510296d ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
      Signed-off-by: default avatarQuentin Monnet <qmo@kernel.org>
      Message-ID: <20240320013457.44808-1-qmo@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e9a826dd
    • Quentin Monnet's avatar
      bpftool: Enable libbpf logs when loading pid_iter in debug mode · be24a895
      Quentin Monnet authored
      When trying to load the pid_iter BPF program used to iterate over the
      PIDs of the processes holding file descriptors to BPF links, we would
      unconditionally silence libbpf in order to keep the output clean if the
      kernel does not support iterators and loading fails.
      
      Although this is the desirable behaviour in most cases, this may hide
      bugs in the pid_iter program that prevent it from loading, and it makes
      it hard to debug such load failures, even in "debug" mode. Instead, it
      makes more sense to print libbpf's logs when we pass the -d|--debug flag
      to bpftool, so that users get the logs to investigate failures without
      having to edit bpftool's source code.
      Signed-off-by: default avatarQuentin Monnet <qmo@kernel.org>
      Message-ID: <20240320012241.42991-1-qmo@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      be24a895
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-raw-tracepoint-support-for-bpf-cookie' · 2e244a72
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      BPF raw tracepoint support for BPF cookie
      
      Add ability to specify and retrieve BPF cookie for raw tracepoint programs.
      Both BTF-aware (SEC("tp_btf")) and non-BTF-aware (SEC("raw_tp")) are
      supported, as they are exactly the same at runtime.
      
      This issue recently came up in production use cases, where custom tried to
      switch from slower classic tracepoints to raw tracepoints and ran into this
      limitation. Luckily, it's not that hard to support this for raw_tp programs.
      
      v2->v3:
        - s/bpf_raw_tp_open/bpf_raw_tracepoint_open_opts/ (Alexei, Eduard);
      v1->v2:
        - fixed type definition for stubs of bpf_probe_{register,unregister};
        - added __u32 :u32 and aligned raw_tp fields (Jiri);
        - added Stanislav's ack.
      ====================
      
      Link: https://lore.kernel.org/r/20240319233852.1977493-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2e244a72
    • Andrii Nakryiko's avatar
      selftests/bpf: add raw_tp/tp_btf BPF cookie subtests · 51146ff0
      Andrii Nakryiko authored
      Add test validating BPF cookie can be passed during raw_tp/tp_btf
      attachment and can be retried at runtime with bpf_get_attach_cookie()
      helper.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Message-ID: <20240319233852.1977493-6-andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      51146ff0
    • Andrii Nakryiko's avatar
      libbpf: add support for BPF cookie for raw_tp/tp_btf programs · 36ffb202
      Andrii Nakryiko authored
      Wire up BPF cookie passing or raw_tp and tp_btf programs, both in
      low-level and high-level APIs.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Message-ID: <20240319233852.1977493-5-andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      36ffb202
    • Andrii Nakryiko's avatar
      bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs · 68ca5d4e
      Andrii Nakryiko authored
      Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
      aware variants). This brings them up to part w.r.t. BPF cookie usage
      with classic tracepoint and fentry/fexit programs.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Message-ID: <20240319233852.1977493-4-andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      68ca5d4e
    • Andrii Nakryiko's avatar
      bpf: pass whole link instead of prog when triggering raw tracepoint · d4dfc570
      Andrii Nakryiko authored
      Instead of passing prog as an argument to bpf_trace_runX() helpers, that
      are called from tracepoint triggering calls, store BPF link itself
      (struct bpf_raw_tp_link for raw tracepoints). This will allow to pass
      extra information like BPF cookie into raw tracepoint registration.
      
      Instead of replacing `struct bpf_prog *prog = __data;` with
      corresponding `struct bpf_raw_tp_link *link = __data;` assignment in
      `__bpf_trace_##call` I just passed `__data` through into underlying
      bpf_trace_runX() call. This works well because we implicitly cast `void *`,
      and it also avoids naming clashes with arguments coming from
      tracepoint's "proto" list. We could have run into the same problem with
      "prog", we just happened to not have a tracepoint that has "prog" input
      argument. We are less lucky with "link", as there are tracepoints using
      "link" argument name already. So instead of trying to avoid naming
      conflicts, let's just remove intermediate local variable. It doesn't
      hurt readibility, it's either way a bit of a maze of calls and macros,
      that requires careful reading.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Message-ID: <20240319233852.1977493-3-andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d4dfc570
    • Andrii Nakryiko's avatar
      bpf: flatten bpf_probe_register call chain · 6b9c2950
      Andrii Nakryiko authored
      bpf_probe_register() and __bpf_probe_register() have identical
      signatures and bpf_probe_register() just redirect to
      __bpf_probe_register(). So get rid of this extra function call step to
      simplify following the source code.
      
      It has no difference at runtime due to inlining, of course.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Message-ID: <20240319233852.1977493-2-andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6b9c2950
  4. 19 Mar, 2024 8 commits
  5. 18 Mar, 2024 4 commits
  6. 15 Mar, 2024 4 commits
  7. 14 Mar, 2024 6 commits
    • Andrii Nakryiko's avatar
      Merge branch 'ignore-additional-fields-in-the-struct_ops-maps-in-an-updated-version' · 6cda7e17
      Andrii Nakryiko authored
      Kui-Feng Lee says:
      
      ====================
      Ignore additional fields in the struct_ops maps in an updated version.
      
      According to an offline discussion, it would be beneficial to
      implement a backward-compatible method for struct_ops types with
      additional fields that are not present in older kernels.
      
      This patchset accepts additional fields of a struct_ops map with all
      zero values even if these fields are not in the corresponding type in
      the kernel. This provides a way to be backward compatible. User space
      programs can use the same map on a machine running an old kernel by
      clearing fields that do not exist in the kernel.
      
      For example, in a test case, it adds an additional field "zeroed" that
      doesn't exist in struct bpf_testmod_ops of the kernel.
      
          struct bpf_testmod_ops___zeroed {
          	int (*test_1)(void);
          	void (*test_2)(int a, int b);
          	int (*test_maybe_null)(int dummy, struct task_struct *task);
          	int zeroed;
          };
      
          SEC(".struct_ops.link")
          struct bpf_testmod_ops___zeroed testmod_zeroed = {
          	.test_1 = (void *)test_1,
          	.test_2 = (void *)test_2_v2,
          };
      
      Here, it doesn't assign a value to "zeroed" of testmod_zeroed, and by
      default the value of this field will be zero. So, the map will be
      accepted by libbpf, but libbpf will skip the "zeroed" field. However,
      if the "zeroed" field is assigned to any value other than "0", libbpf
      will reject to load this map.
      ---
      Changes from v1:
      
       - Fix the issue about function pointer fields.
      
       - Change a warning message, and add an info message for skipping
         fields.
      
       - Add a small demo of additional arguments that are not in the
         function pointer prototype in the kernel.
      
      v1: https://lore.kernel.org/all/20240312183245.341141-1-thinker.li@gmail.com/
      
      Kui-Feng Lee (3):
        libbpf: Skip zeroed or null fields if not found in the kernel type.
        selftests/bpf: Ensure libbpf skip all-zeros fields of struct_ops maps.
        selftests/bpf: Accept extra arguments if they are not used.
      
       tools/lib/bpf/libbpf.c                        |  24 +++-
       .../bpf/prog_tests/test_struct_ops_module.c   | 103 ++++++++++++++++++
       .../bpf/progs/struct_ops_extra_arg.c          |  49 +++++++++
       .../selftests/bpf/progs/struct_ops_module.c   |  16 ++-
       4 files changed, 186 insertions(+), 6 deletions(-)
       create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_extra_arg.c
      ====================
      
      Link: https://lore.kernel.org/r/20240313214139.685112-1-thinker.li@gmail.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      6cda7e17
    • Kui-Feng Lee's avatar
      selftests/bpf: Ensure libbpf skip all-zeros fields of struct_ops maps. · 26a7cf2b
      Kui-Feng Lee authored
      A new version of a type may have additional fields that do not exist in
      older versions. Previously, libbpf would reject struct_ops maps with a new
      version containing extra fields when running on a machine with an old
      kernel. However, we have updated libbpf to ignore these fields if their
      values are all zeros or null in order to provide backward compatibility.
      Signed-off-by: default avatarKui-Feng Lee <thinker.li@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240313214139.685112-3-thinker.li@gmail.com
      26a7cf2b
    • Kui-Feng Lee's avatar
      libbpf: Skip zeroed or null fields if not found in the kernel type. · c911fc61
      Kui-Feng Lee authored
      Accept additional fields of a struct_ops type with all zero values even if
      these fields are not in the corresponding type in the kernel. This provides
      a way to be backward compatible. User space programs can use the same map
      on a machine running an old kernel by clearing fields that do not exist in
      the kernel.
      Signed-off-by: default avatarKui-Feng Lee <thinker.li@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240313214139.685112-2-thinker.li@gmail.com
      c911fc61
    • Quentin Monnet's avatar
      libbpf: Prevent null-pointer dereference when prog to load has no BTF · 9bf48fa1
      Quentin Monnet authored
      In bpf_objec_load_prog(), there's no guarantee that obj->btf is non-NULL
      when passing it to btf__fd(), and this function does not perform any
      check before dereferencing its argument (as bpf_object__btf_fd() used to
      do). As a consequence, we get segmentation fault errors in bpftool (for
      example) when trying to load programs that come without BTF information.
      
      v2: Keep btf__fd() in the fix instead of reverting to bpf_object__btf_fd().
      
      Fixes: df7c3f7d ("libbpf: make uniform use of btf__fd() accessor inside libbpf")
      Suggested-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarQuentin Monnet <qmo@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240314150438.232462-1-qmo@kernel.org
      9bf48fa1
    • Yonghong Song's avatar
      bpftool: Fix missing pids during link show · fe879bb4
      Yonghong Song authored
      Current 'bpftool link' command does not show pids, e.g.,
        $ tools/build/bpftool/bpftool link
        ...
        4: tracing  prog 23
              prog_type lsm  attach_type lsm_mac
              target_obj_id 1  target_btf_id 31320
      
      Hack the following change to enable normal libbpf debug output,
        --- a/tools/bpf/bpftool/pids.c
        +++ b/tools/bpf/bpftool/pids.c
        @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type)
                /* we don't want output polluted with libbpf errors if bpf_iter is not
                 * supported
                 */
        -       default_print = libbpf_set_print(libbpf_print_none);
        +       /* default_print = libbpf_set_print(libbpf_print_none); */
                err = pid_iter_bpf__load(skel);
        -       libbpf_set_print(default_print);
        +       /* libbpf_set_print(default_print); */
      
      Rerun the above bpftool command:
        $ tools/build/bpftool/bpftool link
        libbpf: prog 'iter': BPF program load failed: Permission denied
        libbpf: prog 'iter': -- BEGIN PROG LOAD LOG --
        0: R1=ctx() R10=fp0
        ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69
        0: (79) r6 = *(u64 *)(r1 +8)          ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1)
        ; struct file *file = ctx->file; @ pid_iter.bpf.c:68
        ...
        ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103
        80: (79) r3 = *(u64 *)(r8 +432)       ; R3_w=scalar() R8=ptr_file()
        ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105
        81: (61) r1 = *(u32 *)(r3 +12)
        R3 invalid mem access 'scalar'
        processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2
        -- END PROG LOAD LOG --
        libbpf: prog 'iter': failed to load: -13
        ...
      
      The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type'
      (insn #81) failed in verification.
      
      To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work.
      With this patch, the 'bpftool link' runs successfully with 'pids'.
        $ tools/build/bpftool/bpftool link
        ...
        4: tracing  prog 23
              prog_type lsm  attach_type lsm_mac
              target_obj_id 1  target_btf_id 31320
              pids systemd(1)
      
      Fixes: 44ba7b30 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c")
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Tested-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Reviewed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20240312023249.3776718-1-yonghong.song@linux.dev
      fe879bb4
    • Kui-Feng Lee's avatar
      bpftool: Cast pointers for shadow types explicitly. · c2a0257c
      Kui-Feng Lee authored
      According to a report, skeletons fail to assign shadow pointers when being
      compiled with C++ programs. Unlike C doing implicit casting for void
      pointers, C++ requires an explicit casting.
      
      To support C++, we do explicit casting for each shadow pointer.
      
      Also add struct_ops_module.skel.h to test_cpp to validate C++
      compilation as part of BPF selftests.
      Signed-off-by: default avatarKui-Feng Lee <thinker.li@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20240312013726.1780720-1-thinker.li@gmail.com
      c2a0257c
  8. 13 Mar, 2024 1 commit
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 9187210e
      Linus Torvalds authored
      Pull networking updates from Jakub Kicinski:
       "Core & protocols:
      
         - Large effort by Eric to lower rtnl_lock pressure and remove locks:
      
            - Make commonly used parts of rtnetlink (address, route dumps
              etc) lockless, protected by RCU instead of rtnl_lock.
      
            - Add a netns exit callback which already holds rtnl_lock,
              allowing netns exit to take rtnl_lock once in the core instead
              of once for each driver / callback.
      
            - Remove locks / serialization in the socket diag interface.
      
            - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
      
            - Remove the dev_base_lock, depend on RCU where necessary.
      
         - Support busy polling on a per-epoll context basis. Poll length and
           budget parameters can be set independently of system defaults.
      
         - Introduce struct net_hotdata, to make sure read-mostly global
           config variables fit in as few cache lines as possible.
      
         - Add optional per-nexthop statistics to ease monitoring / debug of
           ECMP imbalance problems.
      
         - Support TCP_NOTSENT_LOWAT in MPTCP.
      
         - Ensure that IPv6 temporary addresses' preferred lifetimes are long
           enough, compared to other configured lifetimes, and at least 2 sec.
      
         - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
      
         - Add support for the independent control state machine for bonding
           per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
           control state machine.
      
         - Add "network ID" to MCTP socket APIs to support hosts with multiple
           disjoint MCTP networks.
      
         - Re-use the mono_delivery_time skbuff bit for packets which user
           space wants to be sent at a specified time. Maintain the timing
           information while traversing veth links, bridge etc.
      
         - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
      
         - Simplify many places iterating over netdevs by using an xarray
           instead of a hash table walk (hash table remains in place, for use
           on fastpaths).
      
         - Speed up scanning for expired routes by keeping a dedicated list.
      
         - Speed up "generic" XDP by trying harder to avoid large allocations.
      
         - Support attaching arbitrary metadata to netconsole messages.
      
        Things we sprinkled into general kernel code:
      
         - Enforce VM_IOREMAP flag and range in ioremap_page_range and
           introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
           bpf_arena).
      
         - Rework selftest harness to enable the use of the full range of ksft
           exit code (pass, fail, skip, xfail, xpass).
      
        Netfilter:
      
         - Allow userspace to define a table that is exclusively owned by a
           daemon (via netlink socket aliveness) without auto-removing this
           table when the userspace program exits. Such table gets marked as
           orphaned and a restarting management daemon can re-attach/regain
           ownership.
      
         - Speed up element insertions to nftables' concatenated-ranges set
           type. Compact a few related data structures.
      
        BPF:
      
         - Add BPF token support for delegating a subset of BPF subsystem
           functionality from privileged system-wide daemons such as systemd
           through special mount options for userns-bound BPF fs to a trusted
           & unprivileged application.
      
         - Introduce bpf_arena which is sparse shared memory region between
           BPF program and user space where structures inside the arena can
           have pointers to other areas of the arena, and pointers work
           seamlessly for both user-space programs and BPF programs.
      
         - Introduce may_goto instruction that is a contract between the
           verifier and the program. The verifier allows the program to loop
           assuming it's behaving well, but reserves the right to terminate
           it.
      
         - Extend the BPF verifier to enable static subprog calls in spin lock
           critical sections.
      
         - Support registration of struct_ops types from modules which helps
           projects like fuse-bpf that seeks to implement a new struct_ops
           type.
      
         - Add support for retrieval of cookies for perf/kprobe multi links.
      
         - Support arbitrary TCP SYN cookie generation / validation in the TC
           layer with BPF to allow creating SYN flood handling in BPF
           firewalls.
      
         - Add code generation to inline the bpf_kptr_xchg() helper which
           improves performance when stashing/popping the allocated BPF
           objects.
      
        Wireless:
      
         - Add SPP (signaling and payload protected) AMSDU support.
      
         - Support wider bandwidth OFDMA, as required for EHT operation.
      
        Driver API:
      
         - Major overhaul of the Energy Efficient Ethernet internals to
           support new link modes (2.5GE, 5GE), share more code between
           drivers (especially those using phylib), and encourage more
           uniform behavior. Convert and clean up drivers.
      
         - Define an API for querying per netdev queue statistics from
           drivers.
      
         - IPSec: account in global stats for fully offloaded sessions.
      
         - Create a concept of Ethernet PHY Packages at the Device Tree level,
           to allow parameterizing the existing PHY package code.
      
         - Enable Rx hashing (RSS) on GTP protocol fields.
      
        Misc:
      
         - Improvements and refactoring all over networking selftests.
      
         - Create uniform module aliases for TC classifiers, actions, and
           packet schedulers to simplify creating modprobe policies.
      
         - Address all missing MODULE_DESCRIPTION() warnings in networking.
      
         - Extend the Netlink descriptions in YAML to cover message
           encapsulation or "Netlink polymorphism", where interpretation of
           nested attributes depends on link type, classifier type or some
           other "class type".
      
        Drivers:
      
         - Ethernet high-speed NICs:
            - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
            - Intel (100G, ice, idpf):
               - support E825-C devices
            - nVidia/Mellanox:
               - support devices with one port and multiple PCIe links
            - Broadcom (bnxt):
               - support n-tuple filters
               - support configuring the RSS key
            - Wangxun (ngbe/txgbe):
               - implement irq_domain for TXGBE's sub-interrupts
            - Pensando/AMD:
               - support XDP
               - optimize queue submission and wakeup handling (+17% bps)
               - optimize struct layout, saving 28% of memory on queues
      
         - Ethernet NICs embedded and virtual:
            - Google cloud vNIC:
               - refactor driver to perform memory allocations for new queue
                 config before stopping and freeing the old queue memory
            - Synopsys (stmmac):
               - obey queueMaxSDU and implement counters required by 802.1Qbv
            - Renesas (ravb):
               - support packet checksum offload
               - suspend to RAM and runtime PM support
      
         - Ethernet switches:
            - nVidia/Mellanox:
               - support for nexthop group statistics
            - Microchip:
               - ksz8: implement PHY loopback
               - add support for KSZ8567, a 7-port 10/100Mbps switch
      
         - PTP:
            - New driver for RENESAS FemtoClock3 Wireless clock generator.
            - Support OCP PTP cards designed and built by Adva.
      
         - CAN:
            - Support recvmsg() flags for own, local and remote traffic on CAN
              BCM sockets.
            - Support for esd GmbH PCIe/402 CAN device family.
            - m_can:
               - Rx/Tx submission coalescing
               - wake on frame Rx
      
         - WiFi:
            - Intel (iwlwifi):
               - enable signaling and payload protected A-MSDUs
               - support wider-bandwidth OFDMA
               - support for new devices
               - bump FW API to 89 for AX devices; 90 for BZ/SC devices
            - MediaTek (mt76):
               - mt7915: newer ADIE version support
               - mt7925: radio temperature sensor support
            - Qualcomm (ath11k):
               - support 6 GHz station power modes: Low Power Indoor (LPI),
                 Standard Power) SP and Very Low Power (VLP)
               - QCA6390 & WCN6855: support 2 concurrent station interfaces
               - QCA2066 support
            - Qualcomm (ath12k):
               - refactoring in preparation for Multi-Link Operation (MLO)
                 support
               - 1024 Block Ack window size support
               - firmware-2.bin support
               - support having multiple identical PCI devices (firmware needs
                 to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
               - QCN9274: support split-PHY devices
               - WCN7850: enable Power Save Mode in station mode
               - WCN7850: P2P support
            - RealTek:
               - rtw88: support for more rtw8811cu and rtw8821cu devices
               - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
               - rtlwifi: speed up USB firmware initialization
               - rtwl8xxxu:
                   - RTL8188F: concurrent interface support
                   - Channel Switch Announcement (CSA) support in AP mode
            - Broadcom (brcmfmac):
               - per-vendor feature support
               - per-vendor SAE password setup
               - DMI nvram filename quirk for ACEPC W5 Pro"
      
      * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
        nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
        nexthop: Fix out-of-bounds access during attribute validation
        nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
        nexthop: Only parse NHA_OP_FLAGS for get messages that require it
        bpf: move sleepable flag from bpf_prog_aux to bpf_prog
        bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
        selftests/bpf: Add kprobe multi triggering benchmarks
        ptp: Move from simple ida to xarray
        vxlan: Remove generic .ndo_get_stats64
        vxlan: Do not alloc tstats manually
        devlink: Add comments to use netlink gen tool
        nfp: flower: handle acti_netdevs allocation failure
        net/packet: Add getsockopt support for PACKET_COPY_THRESH
        net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
        selftests/bpf: Add bpf_arena_htab test.
        selftests/bpf: Add bpf_arena_list test.
        selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
        bpf: Add helper macro bpf_addr_space_cast()
        libbpf: Recognize __arena global variables.
        bpftool: Recognize arena map type
        ...
      9187210e
  9. 12 Mar, 2024 3 commits
    • Linus Torvalds's avatar
      Merge tag 'docs-6.9' of git://git.lwn.net/linux · 1f440397
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "A moderatly busy cycle for development this time around.
      
         - Some cleanup of the main index page for easier navigation
      
         - Rework some of the other top-level pages for better readability
           and, with luck, fewer merge conflicts in the future.
      
         - Submit-checklist improvements, hopefully the first of many.
      
         - New Italian translations
      
         - A fair number of kernel-doc fixes and improvements. We have also
           dropped the recommendation to use an old version of Sphinx.
      
         - A new document from Thorsten on bisection
      
        ... and lots of fixes and updates"
      
      * tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits)
        docs: verify/bisect: fixes, finetuning, and support for Arch
        docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs
        docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst
        docs: submit-checklist: use subheadings
        docs: submit-checklist: structure by category
        docs: new text on bisecting which also covers bug validation
        docs: drop the version constraints for sphinx and dependencies
        docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4
        docs: Restore "smart quotes" for quotes
        docs/zh_CN: accurate translation of "function"
        docs: Include simplified link titles in main index
        docs: Correct formatting of title in admin-guide/index.rst
        docs: kernel_feat.py: fix build error for missing files
        MAINTAINERS: Set the field name for subsystem profile section
        kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO
        Fixed case issue with 'fault-injection' in documentation
        kernel-doc: handle #if in enums as well
        Documentation: update mailing list addresses
        doc: kerneldoc.py: fix indentation
        scripts/kernel-doc: simplify signature printing
        ...
      1f440397
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 3749bda2
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "Two small audit patches:
      
         - Use the KMEM_CACHE() macro instead of kmem_cache_create()
      
           The guidance appears to be to use the KMEM_CACHE() macro when
           possible and there is no reason why we can't use the macro, so
           let's use it.
      
         - Remove an unnecessary assignment in audit_dupe_lsm_field()
      
           A return value variable was assigned a value in its declaration,
           but the declaration value is overwritten before the return value
           variable is ever referenced; drop the assignment at declaration
           time"
      
      * tag 'audit-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: use KMEM_CACHE() instead of kmem_cache_create()
        audit: remove unnecessary assignment in audit_dupe_lsm_field()
      3749bda2
    • Linus Torvalds's avatar
      Merge tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next · 681ba318
      Linus Torvalds authored
      Pull smack updates from Casey Schaufler:
      
       - Improvements to the initialization of in-memory inodes
      
       - A fix in ramfs to propery ensure the initialization of in-memory
         inodes
      
       - Removal of duplicated code in smack_cred_transfer()
      
      * tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next:
        Smack: use init_task_smack() in smack_cred_transfer()
        ramfs: Initialize security of in-memory inodes
        smack: Initialize the in-memory inode in smack_inode_init_security()
        smack: Always determine inode labels in smack_inode_init_security()
        smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
        smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
      681ba318