1. 29 Dec, 2022 3 commits
  2. 28 Dec, 2022 7 commits
    • Xin Liu's avatar
      libbpf: fix errno is overwritten after being closed. · 07453245
      Xin Liu authored
      In the ensure_good_fd function, if the fcntl function succeeds but
      the close function fails, ensure_good_fd returns a normal fd and
      sets errno, which may cause users to misunderstand. The close
      failure is not a serious problem, and the correct FD has been
      handed over to the upper-layer application. Let's restore errno here.
      Signed-off-by: default avatarXin Liu <liuxin350@huawei.com>
      Link: https://lore.kernel.org/r/20221223133618.10323-1-liuxin350@huawei.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      07453245
    • Andrii Nakryiko's avatar
      bpf: fix regs_exact() logic in regsafe() to remap IDs correctly · 4633a006
      Andrii Nakryiko authored
      Comparing IDs exactly between two separate states is not just
      suboptimal, but also incorrect in some cases. So update regs_exact()
      check to do byte-by-byte memcmp() only up to id/ref_obj_id. For id and
      ref_obj_id perform proper check_ids() checks, taking into account idmap.
      
      This change makes more states equivalent improving insns and states
      stats across a bunch of selftest BPF programs:
      
      File                                         Program                           Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
      -------------------------------------------  --------------------------------  ---------  ---------  --------------  ----------  ----------  -------------
      cgrp_kfunc_success.bpf.linked1.o             test_cgrp_get_release                   141        137     -4 (-2.84%)          13          13    +0 (+0.00%)
      cgrp_kfunc_success.bpf.linked1.o             test_cgrp_xchg_release                  142        139     -3 (-2.11%)          14          13    -1 (-7.14%)
      connect6_prog.bpf.linked1.o                  connect_v6_prog                         139        102   -37 (-26.62%)           9           6   -3 (-33.33%)
      ima.bpf.linked1.o                            bprm_creds_for_exec                      68         61    -7 (-10.29%)           6           5   -1 (-16.67%)
      linked_list.bpf.linked1.o                    global_list_in_list                     569        499   -70 (-12.30%)          60          52   -8 (-13.33%)
      linked_list.bpf.linked1.o                    global_list_push_pop                    167        150   -17 (-10.18%)          18          16   -2 (-11.11%)
      linked_list.bpf.linked1.o                    global_list_push_pop_multiple           881        815    -66 (-7.49%)          74          63  -11 (-14.86%)
      linked_list.bpf.linked1.o                    inner_map_list_in_list                  579        534    -45 (-7.77%)          61          55    -6 (-9.84%)
      linked_list.bpf.linked1.o                    inner_map_list_push_pop                 190        181     -9 (-4.74%)          19          18    -1 (-5.26%)
      linked_list.bpf.linked1.o                    inner_map_list_push_pop_multiple        916        850    -66 (-7.21%)          75          64  -11 (-14.67%)
      linked_list.bpf.linked1.o                    map_list_in_list                        588        525   -63 (-10.71%)          62          55   -7 (-11.29%)
      linked_list.bpf.linked1.o                    map_list_push_pop                       183        174     -9 (-4.92%)          18          17    -1 (-5.56%)
      linked_list.bpf.linked1.o                    map_list_push_pop_multiple              909        843    -66 (-7.26%)          75          64  -11 (-14.67%)
      map_kptr.bpf.linked1.o                       test_map_kptr                           264        256     -8 (-3.03%)          26          26    +0 (+0.00%)
      map_kptr.bpf.linked1.o                       test_map_kptr_ref                        95         91     -4 (-4.21%)           9           8   -1 (-11.11%)
      task_kfunc_success.bpf.linked1.o             test_task_xchg_release                  139        136     -3 (-2.16%)          14          13    -1 (-7.14%)
      test_bpf_nf.bpf.linked1.o                    nf_skb_ct_test                          815        509  -306 (-37.55%)          57          30  -27 (-47.37%)
      test_bpf_nf.bpf.linked1.o                    nf_xdp_ct_test                          815        509  -306 (-37.55%)          57          30  -27 (-47.37%)
      test_cls_redirect.bpf.linked1.o              cls_redirect                          78925      78390   -535 (-0.68%)        4782        4704   -78 (-1.63%)
      test_cls_redirect_subprogs.bpf.linked1.o     cls_redirect                          64901      63897  -1004 (-1.55%)        4612        4470  -142 (-3.08%)
      test_sk_lookup.bpf.linked1.o                 access_ctx_sk                           181         95   -86 (-47.51%)          19          10   -9 (-47.37%)
      test_sk_lookup.bpf.linked1.o                 ctx_narrow_access                       447        437    -10 (-2.24%)          38          37    -1 (-2.63%)
      test_sk_lookup_kern.bpf.linked1.o            sk_lookup_success                       148        133   -15 (-10.14%)          14          12   -2 (-14.29%)
      test_tcp_check_syncookie_kern.bpf.linked1.o  check_syncookie_clsact                  304        300     -4 (-1.32%)          23          22    -1 (-4.35%)
      test_tcp_check_syncookie_kern.bpf.linked1.o  check_syncookie_xdp                     304        300     -4 (-1.32%)          23          22    -1 (-4.35%)
      test_verify_pkcs7_sig.bpf.linked1.o          bpf                                      87         76   -11 (-12.64%)           7           6   -1 (-14.29%)
      -------------------------------------------  --------------------------------  ---------  ---------  --------------  ----------  ----------  -------------
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221223054921.958283-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4633a006
    • Andrii Nakryiko's avatar
      bpf: perform byte-by-byte comparison only when necessary in regsafe() · 4a95c85c
      Andrii Nakryiko authored
      Extract byte-by-byte comparison of bpf_reg_state in regsafe() into
      a helper function, which makes it more convenient to use it "on demand"
      only for registers that benefit from such checks, instead of doing it
      all the time, even if result of such comparison is ignored.
      
      Also, remove WARN_ON_ONCE(1)+return false dead code. There is no risk of
      missing some case as compiler will warn about non-void function not
      returning value in some branches (and that under assumption that default
      case is removed in the future).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221223054921.958283-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4a95c85c
    • Andrii Nakryiko's avatar
      bpf: reject non-exact register type matches in regsafe() · 910f6999
      Andrii Nakryiko authored
      Generalize the (somewhat implicit) rule of regsafe(), which states that
      if register types in old and current states do not match *exactly*, they
      can't be safely considered equivalent.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221223054921.958283-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      910f6999
    • Andrii Nakryiko's avatar
      bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule · 7f4ce97c
      Andrii Nakryiko authored
      Make generic check to prevent XXX_OR_NULL and XXX register types to be
      intermixed. While technically in some situations it could be safe, it's
      impossible to enforce due to the loss of an ID when converting
      XXX_OR_NULL to its non-NULL variant. So prevent this in general, not
      just for PTR_TO_MAP_KEY and PTR_TO_MAP_VALUE.
      
      PTR_TO_MAP_KEY_OR_NULL and PTR_TO_MAP_VALUE_OR_NULL checks, which were
      previously special-cased, are simplified to generic check that takes
      into account range_within() and tnum_in(). This is correct as BPF
      verifier doesn't allow arithmetic on XXX_OR_NULL register types, so
      var_off and ranges should stay zero. But even if in the future this
      restriction is lifted, it's even more important to enforce that var_off
      and ranges are compatible, otherwise it's possible to construct case
      where this can be exploited to bypass verifier's memory range safety
      checks.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221223054921.958283-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7f4ce97c
    • Andrii Nakryiko's avatar
      bpf: reorganize struct bpf_reg_state fields · a73bf9f2
      Andrii Nakryiko authored
      Move id and ref_obj_id fields after scalar data section (var_off and
      ranges). This is necessary to simplify next patch which will change
      regsafe()'s logic to be safer, as it makes the contents that has to be
      an exact match (type-specific parts, off, type, and var_off+ranges)
      a single sequential block of memory, while id and ref_obj_id should
      always be remapped and thus can't be memcp()'ed.
      
      There are few places that assume that var_off is after id/ref_obj_id to
      clear out id/ref_obj_id with the single memset(0). These are changed to
      explicitly zero-out id/ref_obj_id fields. Other places are adjusted to
      preserve exact byte-by-byte comparison behavior.
      
      No functional changes.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221223054921.958283-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a73bf9f2
    • Andrii Nakryiko's avatar
      bpf: teach refsafe() to take into account ID remapping · e8f55fcf
      Andrii Nakryiko authored
      states_equal() check performs ID mapping between old and new states to
      establish a 1-to-1 correspondence between IDs, even if their absolute
      numberic values across two equivalent states differ. This is important
      both for correctness and to avoid unnecessary work when two states are
      equivalent.
      
      With recent changes we partially fixed this logic by maintaining ID map
      across all function frames. This patch also makes refsafe() check take
      into account (and maintain) ID map, making states_equal() behavior more
      optimal and correct.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221223054921.958283-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e8f55fcf
  3. 22 Dec, 2022 2 commits
  4. 21 Dec, 2022 7 commits
    • Dave Marchevsky's avatar
      selftests/bpf: Add verifier test exercising jit PROBE_MEM logic · 59fe41b5
      Dave Marchevsky authored
      This patch adds a test exercising logic that was fixed / improved in
      the previous patch in the series, as well as general sanity checking for
      jit's PROBE_MEM logic which should've been unaffected by the previous
      patch.
      
      The added verifier test does the following:
      
        * Acquire a referenced kptr to struct prog_test_ref_kfunc using
          existing net/bpf/test_run.c kfunc
          * Helper returns ptr to a specific prog_test_ref_kfunc whose first
            two fields - both ints - have been prepopulated w/ vals 42 and
            108, respectively
        * kptr_xchg the acquired ptr into an arraymap
        * Do a direct map_value load of the just-added ptr
          * Goal of all this setup is to get an unreferenced kptr pointing to
            struct with ints of known value, which is the result of this step
        * Using unreferenced kptr obtained in previous step, do loads of
          prog_test_ref_kfunc.a (offset 0) and .b (offset 4)
        * Then incr the kptr by 8 and load prog_test_ref_kfunc.a again (this
          time at offset -8)
        * Add all the loaded ints together and return
      
      Before the PROBE_MEM fixes in previous patch, the loads at offset 0 and
      4 would succeed, while the load at offset -8 would incorrectly fail
      runtime check emitted by the JIT and 0 out dst reg as a result. This
      confirmed by retval of 150 for this test before previous patch - since
      second .a read is 0'd out - and a retval of 192 with the fixed logic.
      
      The test exercises the two optimizations to fixed logic added in last
      patch as well:
      
        * First load, with insn "r8 = *(u32 *)(r9 + 0)" exercises "insn->off
          is 0, no need to add / sub from src_reg" optimization
        * Third load, with insn "r9 = *(u32 *)(r9 - 8)" exercises "src_reg ==
          dst_reg, no need to restore src_reg after load" optimization
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20221216214319.3408356-2-davemarchevsky@fb.com
      59fe41b5
    • Dave Marchevsky's avatar
      bpf, x86: Improve PROBE_MEM runtime load check · 90156f4b
      Dave Marchevsky authored
      This patch rewrites the runtime PROBE_MEM check insns emitted by the BPF
      JIT in order to ensure load safety. The changes in the patch fix two
      issues with the previous logic and more generally improve size of
      emitted code. Paragraphs between this one and "FIX 1" below explain the
      purpose of the runtime check and examine the current implementation.
      
      When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
      address being loaded from is not necessarily valid. The BPF jit sets up
      exception handlers for each such load which catch page faults and 0 out
      the destination register.
      
      Arbitrary register-relative loads can escape this exception handling
      mechanism. Specifically, a load like dst_reg = *(src_reg + off) will not
      trigger BPF exception handling if (src_reg + off) is outside of kernel
      address space, resulting in an uncaught page fault. A concrete example
      of such behavior is a program like:
      
        struct result {
          char space[40];
          long a;
        };
      
        /* if err, returns ERR_PTR(-EINVAL) */
        struct result *ptr = get_ptr_maybe_err();
        long x = ptr->a;
      
      If get_ptr_maybe_err returns ERR_PTR(-EINVAL) and the result isn't
      checked for err, 'result' will be (u64)-EINVAL, a number close to
      U64_MAX. The ptr->a load will be > U64_MAX and will wrap over to a small
      positive u64, which will be in userspace and thus not covered by BPF
      exception handling mechanism.
      
      In order to prevent such loads from occurring, the BPF jit emits some
      instructions which do runtime checking of (src_reg + off) and skip the
      actual load if it's out of range. As an example, here are instructions
      emitted for a %rdi = *(%rdi + 0x10) PROBE_MEM load:
      
        72:   movabs $0x800000000010,%r11 --|
        7c:   cmp    %r11,%rdi              |- 72 - 7f: Check 1
        7f:    jb    0x000000000000008d   --|
        81:   mov    %rdi,%r11             -----|
        84:   add    $0x0000000000000010,%r11   |- 81-8b: Check 2
        8b:   jnc    0x0000000000000091    -----|
        8d:   xor    %edi,%edi             ---- 0 out dest
        8f:   jmp    0x0000000000000095
        91:   mov    0x10(%rdi),%rdi       ---- Actual load
        95:
      
      The JIT considers kernel address space to start at MAX_TASK_SIZE +
      PAGE_SIZE. Determining whether a load will be outside of kernel address
      space should be a simple check:
      
        (src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
      
      But because there is only one spare register when the checking logic is
      emitted, this logic is split into two checks:
      
        Check 1: src_reg >= (MAX_TASK_SIZE + PAGE_SIZE - off)
        Check 2: src_reg + off doesn't wrap over U64_MAX and result in small pos u64
      
      Emitted insns implementing Checks 1 and 2 are annotated in the above
      example. Check 1 can be done with a single spare register since the
      source reg by definition is the left-hand-side of the inequality.
      Since adding 'off' to both sides of Check 1's inequality results in the
      original inequality we want, it's equivalent to testing that inequality.
      Except in the case where src_reg + off wraps past U64_MAX, which is why
      Check 2 needs to actually add src_reg + off if Check 1 passes - again
      using the single spare reg.
      
      FIX 1: The Check 1 inequality listed above is not what current code is
      doing. Current code is a bit more pessimistic, instead checking:
      
        src_reg >= (MAX_TASK_SIZE + PAGE_SIZE + abs(off))
      
      The 0x800000000010 in above example is from this current check. If Check
      1 was corrected to use the correct right-hand-side, the value would be
      0x7ffffffffff0. This patch changes the checking logic more broadly (FIX
      2 below will elaborate), fixing this issue as a side-effect of the
      rewrite. Regardless, it's important to understand why Check 1 should've
      been doing MAX_TASK_SIZE + PAGE_SIZE - off before proceeding.
      
      FIX 2: Current code relies on a 'jnc' to determine whether src_reg + off
      addition wrapped over. For negative offsets this logic is incorrect.
      Consider Check 2 insns emitted when off = -0x10:
      
        81:   mov    %rdi,%r11
        84:   add    0xfffffffffffffff0,%r11
        8b:   jnc    0x0000000000000091
      
      2's complement representation of -0x10 is a large positive u64. Any
      value of src_reg that passes Check 1 will result in carry flag being set
      after (src_reg + off) addition. So a load with any negative offset will
      always fail Check 2 at runtime and never do the actual load. This patch
      fixes the negative offset issue by rewriting both checks in order to not
      rely on carry flag.
      
      The rewrite takes advantage of the fact that, while we only have one
      scratch reg to hold arbitrary values, we know the offset at JIT time.
      This we can use src_reg as a temporary scratch reg to hold src_reg +
      offset since we can return it to its original value by later subtracting
      offset. As a result we can directly check the original inequality we
      care about:
      
        (src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
      
      For a load like %rdi = *(%rsi + -0x10), this results in emitted code:
      
        43:   movabs $0x800000000000,%r11
        4d:   add    $0xfffffffffffffff0,%rsi --- src_reg += off
        54:   cmp    %r11,%rsi                --- Check original inequality
        57:   jae    0x000000000000005d
        59:   xor    %edi,%edi
        5b:   jmp    0x0000000000000061
        5d:   mov    0x0(%rdi),%rsi           --- Actual Load
        61:   sub    $0xfffffffffffffff0,%rsi --- src_reg -= off
      
      Note that the actual load is always done with offset 0, since previous
      insns have already done src_reg += off. Regardless of whether the new
      check succeeds or fails, insn 61 is always executed, returning src_reg
      to its original value.
      
      Because the goal of these checks is to ensure that loaded-from address
      will be protected by BPF exception handler, the new check can safely
      ignore any wrapover from insn 4d. If such wrapped-over address passes
      insn 54 + 57's cmp-and-jmp it will have such protection so the load can
      proceed.
      
      IMPROVEMENTS: The above improved logic is 8 insns vs original logic's 9,
      and has 1 fewer jmp. The number of checking insns can be further
      improved in common scenarios:
      
      If src_reg == dst_reg, the actual load insn will clobber src_reg, so
      there's no original src_reg state for the sub insn immediately following
      the load to restore, so it can be omitted. In fact, it must be omitted
      since it would incorrectly subtract from the result of the load if it
      wasn't. So for src_reg == dst_reg, JIT emits these insns:
      
        3c:   movabs $0x800000000000,%r11
        46:   add    $0xfffffffffffffff0,%rdi
        4d:   cmp    %r11,%rdi
        50:   jae    0x0000000000000056
        52:   xor    %edi,%edi
        54:   jmp    0x000000000000005a
        56:   mov    0x0(%rdi),%rdi
        5a:
      
      The only difference from larger example being the omitted sub, which
      would've been insn 5a in this example.
      
      If offset == 0, we can similarly omit the sub as in previous case, since
      there's nothing added to subtract. For the same reason we can omit the
      addition as well, resulting in JIT emitting these insns:
      
        46:   movabs $0x800000000000,%r11
        4d:   cmp    %r11,%rdi
        50:   jae    0x0000000000000056
        52:   xor    %edi,%edi
        54:   jmp    0x000000000000005a
        56:   mov    0x0(%rdi),%rdi
        5a:
      
      Although the above example also has src_reg == dst_reg, the same
      offset == 0 optimization is valid to apply if src_reg != dst_reg.
      
      To summarize the improvements in emitted insn count for the
      check-and-load:
      
      BEFORE:                8 check insns, 3 jmps
      AFTER (general case):  7 check insns, 2 jmps (12.5% fewer insn, 33% jmp)
      AFTER (src == dst):    6 check insns, 2 jmps (25% fewer insn)
      AFTER (offset == 0):   5 check insns, 2 jmps (37.5% fewer insn)
      
      (Above counts don't include the 1 load insn, just checking around it)
      
      Based on BPF bytecode + JITted x86 insn I saw while experimenting with
      these improvements, I expect the src_reg == dst_reg case to occur most
      often, followed by offset == 0, then the general case.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20221216214319.3408356-1-davemarchevsky@fb.com
      90156f4b
    • Andrii Nakryiko's avatar
      libbpf: start v1.2 development cycle · 4ec38eda
      Andrii Nakryiko authored
      Bump current version for new development cycle to v1.2.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20221221180049.853365-1-andrii@kernel.orgSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      4ec38eda
    • Martin KaFai Lau's avatar
      bpf: Reduce smap->elem_size · 552d42a3
      Martin KaFai Lau authored
      'struct bpf_local_storage_elem' has an unused 56 byte padding at the
      end due to struct's cache-line alignment requirement. This padding
      space is overlapped by storage value contents, so if we use sizeof()
      to calculate the total size, we overinflate it by 56 bytes. Use
      offsetof() instead to calculate more exact memory use.
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221221013036.3427431-1-martin.lau@linux.dev
      552d42a3
    • Andrii Nakryiko's avatar
      Merge branch 'bpftool: improve error handing for missing .BTF section' · 7b43df6c
      Andrii Nakryiko authored
      Changbin Du says:
      
      ====================
      Display error message for missing ".BTF" section and clean up empty
      vmlinux.h file.
      
      v3:
       - fix typo and make error message consistent. (Andrii Nakryiko)
       - split out perf change.
      v2:
       - remove vmlinux specific error info.
       - use builtin target .DELETE_ON_ERROR: to delete empty vmlinux.h
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      7b43df6c
    • Changbin Du's avatar
      bpf: makefiles: Do not generate empty vmlinux.h · e7f0d5cd
      Changbin Du authored
      Remove the empty vmlinux.h if bpftool failed to dump btf info.
      The empty vmlinux.h can hide real error when reading output
      of make.
      
      This is done by adding .DELETE_ON_ERROR special target in related
      makefiles.
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20221217223509.88254-3-changbin.du@gmail.com
      e7f0d5cd
    • Changbin Du's avatar
      libbpf: Show error info about missing ".BTF" section · e6b4e1d7
      Changbin Du authored
      Show the real problem instead of just saying "No such file or directory".
      
      Now will print below info:
      libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
      Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221217223509.88254-2-changbin.du@gmail.com
      e6b4e1d7
  5. 20 Dec, 2022 2 commits
  6. 19 Dec, 2022 9 commits
  7. 15 Dec, 2022 2 commits
  8. 14 Dec, 2022 7 commits
  9. 13 Dec, 2022 1 commit
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 7e68dd7d
      Linus Torvalds authored
      Pull networking updates from Paolo Abeni:
       "Core:
      
         - Allow live renaming when an interface is up
      
         - Add retpoline wrappers for tc, improving considerably the
           performances of complex queue discipline configurations
      
         - Add inet drop monitor support
      
         - A few GRO performance improvements
      
         - Add infrastructure for atomic dev stats, addressing long standing
           data races
      
         - De-duplicate common code between OVS and conntrack offloading
           infrastructure
      
         - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
      
         - Netfilter: introduce packet parser for tunneled packets
      
         - Replace IPVS timer-based estimators with kthreads to scale up the
           workload with the number of available CPUs
      
         - Add the helper support for connection-tracking OVS offload
      
        BPF:
      
         - Support for user defined BPF objects: the use case is to allocate
           own objects, build own object hierarchies and use the building
           blocks to build own data structures flexibly, for example, linked
           lists in BPF
      
         - Make cgroup local storage available to non-cgroup attached BPF
           programs
      
         - Avoid unnecessary deadlock detection and failures wrt BPF task
           storage helpers
      
         - A relevant bunch of BPF verifier fixes and improvements
      
         - Veristat tool improvements to support custom filtering, sorting,
           and replay of results
      
         - Add LLVM disassembler as default library for dumping JITed code
      
         - Lots of new BPF documentation for various BPF maps
      
         - Add bpf_rcu_read_{,un}lock() support for sleepable programs
      
         - Add RCU grace period chaining to BPF to wait for the completion of
           access from both sleepable and non-sleepable BPF programs
      
         - Add support storing struct task_struct objects as kptrs in maps
      
         - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
           values
      
         - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
      
        Protocols:
      
         - TCP: implement Protective Load Balancing across switch links
      
         - TCP: allow dynamically disabling TCP-MD5 static key, reverting back
           to fast[er]-path
      
         - UDP: Introduce optional per-netns hash lookup table
      
         - IPv6: simplify and cleanup sockets disposal
      
         - Netlink: support different type policies for each generic netlink
           operation
      
         - MPTCP: add MSG_FASTOPEN and FastOpen listener side support
      
         - MPTCP: add netlink notification support for listener sockets events
      
         - SCTP: add VRF support, allowing sctp sockets binding to VRF devices
      
         - Add bridging MAC Authentication Bypass (MAB) support
      
         - Extensions for Ethernet VPN bridging implementation to better
           support multicast scenarios
      
         - More work for Wi-Fi 7 support, comprising conversion of all the
           existing drivers to internal TX queue usage
      
         - IPSec: introduce a new offload type (packet offload) allowing
           complete header processing and crypto offloading
      
         - IPSec: extended ack support for more descriptive XFRM error
           reporting
      
         - RXRPC: increase SACK table size and move processing into a
           per-local endpoint kernel thread, reducing considerably the
           required locking
      
         - IEEE 802154: synchronous send frame and extended filtering support,
           initial support for scanning available 15.4 networks
      
         - Tun: bump the link speed from 10Mbps to 10Gbps
      
         - Tun/VirtioNet: implement UDP segmentation offload support
      
        Driver API:
      
         - PHY/SFP: improve power level switching between standard level 1 and
           the higher power levels
      
         - New API for netdev <-> devlink_port linkage
      
         - PTP: convert existing drivers to new frequency adjustment
           implementation
      
         - DSA: add support for rx offloading
      
         - Autoload DSA tagging driver when dynamically changing protocol
      
         - Add new PCP and APPTRUST attributes to Data Center Bridging
      
         - Add configuration support for 800Gbps link speed
      
         - Add devlink port function attribute to enable/disable RoCE and
           migratable
      
         - Extend devlink-rate to support strict prioriry and weighted fair
           queuing
      
         - Add devlink support to directly reading from region memory
      
         - New device tree helper to fetch MAC address from nvmem
      
         - New big TCP helper to simplify temporary header stripping
      
        New hardware / drivers:
      
         - Ethernet:
            - Marvel Octeon CNF95N and CN10KB Ethernet Switches
            - Marvel Prestera AC5X Ethernet Switch
            - WangXun 10 Gigabit NIC
            - Motorcomm yt8521 Gigabit Ethernet
            - Microchip ksz9563 Gigabit Ethernet Switch
            - Microsoft Azure Network Adapter
            - Linux Automation 10Base-T1L adapter
      
         - PHY:
            - Aquantia AQR112 and AQR412
            - Motorcomm YT8531S
      
         - PTP:
            - Orolia ART-CARD
      
         - WiFi:
            - MediaTek Wi-Fi 7 (802.11be) devices
            - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
              devices
      
         - Bluetooth:
            - Broadcom BCM4377/4378/4387 Bluetooth chipsets
            - Realtek RTL8852BE and RTL8723DS
            - Cypress.CYW4373A0 WiFi + Bluetooth combo device
      
        Drivers:
      
         - CAN:
            - gs_usb: bus error reporting support
            - kvaser_usb: listen only and bus error reporting support
      
         - Ethernet NICs:
            - Intel (100G):
               - extend action skbedit to RX queue mapping
               - implement devlink-rate support
               - support direct read from memory
            - nVidia/Mellanox (mlx5):
               - SW steering improvements, increasing rules update rate
               - Support for enhanced events compression
               - extend H/W offload packet manipulation capabilities
               - implement IPSec packet offload mode
            - nVidia/Mellanox (mlx4):
               - better big TCP support
            - Netronome Ethernet NICs (nfp):
               - IPsec offload support
               - add support for multicast filter
            - Broadcom:
               - RSS and PTP support improvements
            - AMD/SolarFlare:
               - netlink extened ack improvements
               - add basic flower matches to offload, and related stats
            - Virtual NICs:
               - ibmvnic: introduce affinity hint support
            - small / embedded:
               - FreeScale fec: add initial XDP support
               - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
               - TI am65-cpsw: add suspend/resume support
               - Mediatek MT7986: add RX wireless wthernet dispatch support
               - Realtek 8169: enable GRO software interrupt coalescing per
                 default
      
         - Ethernet high-speed switches:
            - Microchip (sparx5):
               - add support for Sparx5 TC/flower H/W offload via VCAP
            - Mellanox mlxsw:
               - add 802.1X and MAC Authentication Bypass offload support
               - add ip6gre support
      
         - Embedded Ethernet switches:
            - Mediatek (mtk_eth_soc):
               - improve PCS implementation, add DSA untag support
               - enable flow offload support
            - Renesas:
               - add rswitch R-Car Gen4 gPTP support
            - Microchip (lan966x):
               - add full XDP support
               - add TC H/W offload via VCAP
               - enable PTP on bridge interfaces
            - Microchip (ksz8):
               - add MTU support for KSZ8 series
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - support configuring channel dwell time during scan
      
         - MediaTek WiFi (mt76):
            - enable Wireless Ethernet Dispatch (WED) offload support
            - add ack signal support
            - enable coredump support
            - remain_on_channel support
      
         - Intel WiFi (iwlwifi):
            - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
            - 320 MHz channels support
      
         - RealTek WiFi (rtw89):
            - new dynamic header firmware format support
            - wake-over-WLAN support"
      
      * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
        ipvs: fix type warning in do_div() on 32 bit
        net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
        net: ipa: add IPA v4.7 support
        dt-bindings: net: qcom,ipa: Add SM6350 compatible
        bnxt: Use generic HBH removal helper in tx path
        IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
        selftests: forwarding: Add bridge MDB test
        selftests: forwarding: Rename bridge_mdb test
        bridge: mcast: Support replacement of MDB port group entries
        bridge: mcast: Allow user space to specify MDB entry routing protocol
        bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
        bridge: mcast: Add support for (*, G) with a source list and filter mode
        bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
        bridge: mcast: Add a flag for user installed source entries
        bridge: mcast: Expose __br_multicast_del_group_src()
        bridge: mcast: Expose br_multicast_new_group_src()
        bridge: mcast: Add a centralized error path
        bridge: mcast: Place netlink policy before validation functions
        bridge: mcast: Split (*, G) and (S, G) addition into different functions
        bridge: mcast: Do not derive entry type from its filter mode
        ...
      7e68dd7d