1. 04 Jan, 2024 19 commits
  2. 03 Jan, 2024 16 commits
    • Andrii Nakryiko's avatar
      Merge branch 'bpf-volatile-compare' · b4560055
      Andrii Nakryiko authored
      Alexei Starovoitov says:
      
      ====================
      bpf: volatile compare
      
      From: Alexei Starovoitov <ast@kernel.org>
      
      v2->v3:
      Debugged profiler.c regression. It was caused by basic block layout.
      Introduce bpf_cmp_likely() and bpf_cmp_unlikely() macros.
      Debugged redundant <<=32, >>=32 with u32 variables. Added cast workaround.
      
      v1->v2:
      Fixed issues pointed out by Daniel, added more tests, attempted to convert profiler.c,
      but barrier_var() wins vs bpf_cmp(). To be investigated.
      Patches 1-4 are good to go, but 5 needs more work.
      ====================
      
      Link: https://lore.kernel.org/r/20231226191148.48536-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      b4560055
    • Alexei Starovoitov's avatar
      selftests/bpf: Convert profiler.c to bpf_cmp. · 7e3811cb
      Alexei Starovoitov authored
      Convert profiler[123].c to "volatile compare" to compare barrier_var() approach vs bpf_cmp_likely() vs bpf_cmp_unlikely().
      
      bpf_cmp_unlikely() produces correct code, but takes much longer to verify:
      
      ./veristat -C -e prog,insns,states before after_with_unlikely
      Program                               Insns (A)  Insns (B)  Insns       (DIFF)  States (A)  States (B)  States     (DIFF)
      ------------------------------------  ---------  ---------  ------------------  ----------  ----------  -----------------
      kprobe__proc_sys_write                     1603      19606  +18003 (+1123.08%)         123        1678  +1555 (+1264.23%)
      kprobe__vfs_link                          11815      70305   +58490 (+495.05%)         971        4967   +3996 (+411.53%)
      kprobe__vfs_symlink                        5464      42896   +37432 (+685.07%)         434        3126   +2692 (+620.28%)
      kprobe_ret__do_filp_open                   5641      44578   +38937 (+690.25%)         446        3162   +2716 (+608.97%)
      raw_tracepoint__sched_process_exec         2770      35962  +33192 (+1198.27%)         226        3121  +2895 (+1280.97%)
      raw_tracepoint__sched_process_exit         1526       2135      +609 (+39.91%)         133         208      +75 (+56.39%)
      raw_tracepoint__sched_process_fork          265        337       +72 (+27.17%)          19          24       +5 (+26.32%)
      tracepoint__syscalls__sys_enter_kill      18782     140407  +121625 (+647.56%)        1286       12176  +10890 (+846.81%)
      
      bpf_cmp_likely() is equivalent to barrier_var():
      
      ./veristat -C -e prog,insns,states before after_with_likely
      Program                               Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
      ------------------------------------  ---------  ---------  --------------  ----------  ----------  -------------
      kprobe__proc_sys_write                     1603       1663    +60 (+3.74%)         123         127    +4 (+3.25%)
      kprobe__vfs_link                          11815      12090   +275 (+2.33%)         971         971    +0 (+0.00%)
      kprobe__vfs_symlink                        5464       5448    -16 (-0.29%)         434         426    -8 (-1.84%)
      kprobe_ret__do_filp_open                   5641       5739    +98 (+1.74%)         446         446    +0 (+0.00%)
      raw_tracepoint__sched_process_exec         2770       2608   -162 (-5.85%)         226         216   -10 (-4.42%)
      raw_tracepoint__sched_process_exit         1526       1526     +0 (+0.00%)         133         133    +0 (+0.00%)
      raw_tracepoint__sched_process_fork          265        265     +0 (+0.00%)          19          19    +0 (+0.00%)
      tracepoint__syscalls__sys_enter_kill      18782      18970   +188 (+1.00%)        1286        1286    +0 (+0.00%)
      kprobe__proc_sys_write                     2700       2809   +109 (+4.04%)         107         109    +2 (+1.87%)
      kprobe__vfs_link                          12238      12366   +128 (+1.05%)         267         269    +2 (+0.75%)
      kprobe__vfs_symlink                        7139       7365   +226 (+3.17%)         167         175    +8 (+4.79%)
      kprobe_ret__do_filp_open                   7264       7070   -194 (-2.67%)         180         182    +2 (+1.11%)
      raw_tracepoint__sched_process_exec         3768       3453   -315 (-8.36%)         211         199   -12 (-5.69%)
      raw_tracepoint__sched_process_exit         3138       3138     +0 (+0.00%)          83          83    +0 (+0.00%)
      raw_tracepoint__sched_process_fork          265        265     +0 (+0.00%)          19          19    +0 (+0.00%)
      tracepoint__syscalls__sys_enter_kill      26679      24327  -2352 (-8.82%)        1067        1037   -30 (-2.81%)
      kprobe__proc_sys_write                     1833       1833     +0 (+0.00%)         157         157    +0 (+0.00%)
      kprobe__vfs_link                           9995      10127   +132 (+1.32%)         803         803    +0 (+0.00%)
      kprobe__vfs_symlink                        5606       5672    +66 (+1.18%)         451         451    +0 (+0.00%)
      kprobe_ret__do_filp_open                   5716       5782    +66 (+1.15%)         462         462    +0 (+0.00%)
      raw_tracepoint__sched_process_exec         3042       3042     +0 (+0.00%)         278         278    +0 (+0.00%)
      raw_tracepoint__sched_process_exit         1680       1680     +0 (+0.00%)         146         146    +0 (+0.00%)
      raw_tracepoint__sched_process_fork          299        299     +0 (+0.00%)          25          25    +0 (+0.00%)
      tracepoint__syscalls__sys_enter_kill      18372      18372     +0 (+0.00%)        1558        1558    +0 (+0.00%)
      
      default (mcpu=v3), no_alu32, cpuv4 have similar differences.
      
      Note one place where bpf_nop_mov() is used to workaround the verifier lack of link
      between the scalar register and its spill to stack.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231226191148.48536-7-alexei.starovoitov@gmail.com
      7e3811cb
    • Alexei Starovoitov's avatar
      bpf: Add bpf_nop_mov() asm macro. · 0bcc62aa
      Alexei Starovoitov authored
      bpf_nop_mov(var) asm macro emits nop register move: rX = rX.
      If 'var' is a scalar and not a fixed constant the verifier will assign ID to it.
      If it's later spilled the stack slot will carry that ID as well.
      Hence the range refining comparison "if rX < const" will update all copies
      including spilled slot.
      This macro is a temporary workaround until the verifier gets smarter.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231226191148.48536-6-alexei.starovoitov@gmail.com
      0bcc62aa
    • Alexei Starovoitov's avatar
      selftests/bpf: Remove bpf_assert_eq-like macros. · 907dbd3e
      Alexei Starovoitov authored
      Since the last user was converted to bpf_cmp, remove bpf_assert_eq/ne/... macros.
      
      __bpf_assert_op() macro is kept for experiments, since it's slightly more efficient
      than bpf_assert(bpf_cmp_unlikely()) until LLVM is fixed.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/bpf/20231226191148.48536-5-alexei.starovoitov@gmail.com
      907dbd3e
    • Alexei Starovoitov's avatar
      selftests/bpf: Convert exceptions_assert.c to bpf_cmp · 624cd2a1
      Alexei Starovoitov authored
      Convert exceptions_assert.c to bpf_cmp_unlikely() macro.
      
      Since
      
      bpf_assert(bpf_cmp_unlikely(var, ==, 100));
      other code;
      
      will generate assembly code:
      
        if r1 == 100 goto L2;
        r0 = 0
        call bpf_throw
      L1:
        other code;
        ...
      
      L2: goto L1;
      
      LLVM generates redundant basic block with extra goto. LLVM will be fixed eventually.
      Right now it's less efficient than __bpf_assert(var, ==, 100) macro that produces:
        if r1 == 100 goto L1;
        r0 = 0
        call bpf_throw
      L1:
        other code;
      
      But extra goto doesn't hurt the verification process.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/bpf/20231226191148.48536-4-alexei.starovoitov@gmail.com
      624cd2a1
    • Alexei Starovoitov's avatar
      bpf: Introduce "volatile compare" macros · a8b242d7
      Alexei Starovoitov authored
      Compilers optimize conditional operators at will, but often bpf programmers
      want to force compilers to keep the same operator in asm as it's written in C.
      Introduce bpf_cmp_likely/unlikely(var1, conditional_op, var2) macros that can be used as:
      
      -               if (seen >= 1000)
      +               if (bpf_cmp_unlikely(seen, >=, 1000))
      
      The macros take advantage of BPF assembly that is C like.
      
      The macros check the sign of variable 'seen' and emits either
      signed or unsigned compare.
      
      For example:
      int a;
      bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX s> 0 goto' in BPF assembly.
      
      unsigned int a;
      bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX > 0 goto' in BPF assembly.
      
      C type conversions coupled with comparison operator are tricky.
        int i = -1;
        unsigned int j = 1;
        if (i < j) // this is false.
      
        long i = -1;
        unsigned int j = 1;
        if (i < j) // this is true.
      
      Make sure BPF program is compiled with -Wsign-compare then the macros will catch
      the mistake.
      
      The macros check LHS (left hand side) only to figure out the sign of compare.
      
      'if 0 < rX goto' is not allowed in the assembly, so the users
      have to use a variable on LHS anyway.
      
      The patch updates few tests to demonstrate the use of the macros.
      
      The macro allows to use BPF_JSET in C code, since LLVM doesn't generate it at
      present. For example:
      
      if (i & j) compiles into r0 &= r1; if r0 == 0 goto
      
      while
      
      if (bpf_cmp_unlikely(i, &, j)) compiles into if r0 & r1 goto
      
      Note that the macros has to be careful with RHS assembly predicate.
      Since:
      u64 __rhs = 1ull << 42;
      asm goto("if r0 < %[rhs] goto +1" :: [rhs] "ri" (__rhs));
      LLVM will silently truncate 64-bit constant into s32 imm.
      
      Note that [lhs] "r"((short)LHS) the type cast is a workaround for LLVM issue.
      When LHS is exactly 32-bit LLVM emits redundant <<=32, >>=32 to zero upper 32-bits.
      When LHS is 64 or 16 or 8-bit variable there are no shifts.
      When LHS is 32-bit the (u64) cast doesn't help. Hence use (short) cast.
      It does _not_ truncate the variable before it's assigned to a register.
      
      Traditional likely()/unlikely() macros that use __builtin_expect(!!(x), 1 or 0)
      have no effect on these macros, hence macros implement the logic manually.
      bpf_cmp_unlikely() macro preserves compare operator as-is while
      bpf_cmp_likely() macro flips the compare.
      
      Consider two cases:
      A.
        for() {
          if (foo >= 10) {
            bar += foo;
          }
          other code;
        }
      
      B.
        for() {
          if (foo >= 10)
             break;
          other code;
        }
      
      It's ok to use either bpf_cmp_likely or bpf_cmp_unlikely macros in both cases,
      but consider that 'break' is effectively 'goto out_of_the_loop'.
      Hence it's better to use bpf_cmp_unlikely in the B case.
      While 'bar += foo' is better to keep as 'fallthrough' == likely code path in the A case.
      
      When it's written as:
      A.
        for() {
          if (bpf_cmp_likely(foo, >=, 10)) {
            bar += foo;
          }
          other code;
        }
      
      B.
        for() {
          if (bpf_cmp_unlikely(foo, >=, 10))
             break;
          other code;
        }
      
      The assembly will look like:
      A.
        for() {
          if r1 < 10 goto L1;
            bar += foo;
        L1:
          other code;
        }
      
      B.
        for() {
          if r1 >= 10 goto L2;
          other code;
        }
        L2:
      
      The bpf_cmp_likely vs bpf_cmp_unlikely changes basic block layout, hence it will
      greatly influence the verification process. The number of processed instructions
      will be different, since the verifier walks the fallthrough first.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/bpf/20231226191148.48536-3-alexei.starovoitov@gmail.com
      a8b242d7
    • Alexei Starovoitov's avatar
      selftests/bpf: Attempt to build BPF programs with -Wsign-compare · 495d2d81
      Alexei Starovoitov authored
      GCC's -Wall includes -Wsign-compare while clang does not.
      Since BPF programs are built with clang we need to add this flag explicitly
      to catch problematic comparisons like:
      
        int i = -1;
        unsigned int j = 1;
        if (i < j) // this is false.
      
        long i = -1;
        unsigned int j = 1;
        if (i < j) // this is true.
      
      C standard for reference:
      
      - If either operand is unsigned long the other shall be converted to unsigned long.
      
      - Otherwise, if one operand is a long int and the other unsigned int, then if a
      long int can represent all the values of an unsigned int, the unsigned int
      shall be converted to a long int; otherwise both operands shall be converted to
      unsigned long int.
      
      - Otherwise, if either operand is long, the other shall be converted to long.
      
      - Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
      
      Unfortunately clang's -Wsign-compare is very noisy.
      It complains about (s32)a == (u32)b which is safe and doen't have surprising behavior.
      
      This patch fixes some of the issues. It needs a follow up to fix the rest.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/bpf/20231226191148.48536-2-alexei.starovoitov@gmail.com
      495d2d81
    • Andrii Nakryiko's avatar
      Merge branch 'bpf-simplify-checking-size-of-helper-accesses' · a640de4c
      Andrii Nakryiko authored
      Andrei Matei says:
      
      ====================
      bpf: Simplify checking size of helper accesses
      
      v3->v4:
      - kept only the minimal change, undoing debatable changes (Andrii)
      - dropped the second patch from before, with changes to the error
        message (Andrii)
      - extracted the new test into a separate patch (Andrii)
      - added Acked by Andrii
      
      v2->v3:
      - split the error-logging function to a separate patch (Andrii)
      - make the error buffers smaller (Andrii)
      - include size of memory region for PTR_TO_MEM (Andrii)
      - nits from Andrii and Eduard
      
      v1->v2:
      - make the error message include more info about the context of the
        zero-sized access (Andrii)
      ====================
      
      Link: https://lore.kernel.org/r/20231221232225.568730-1-andreimatei1@gmail.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      a640de4c
    • Andrei Matei's avatar
      bpf: Add a possibly-zero-sized read test · 72187506
      Andrei Matei authored
      This patch adds a test for the condition that the previous patch mucked
      with - illegal zero-sized helper memory access. As opposed to existing
      tests, this new one uses a size whose lower bound is zero, as opposed to
      a known-zero one.
      Signed-off-by: default avatarAndrei Matei <andreimatei1@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231221232225.568730-3-andreimatei1@gmail.com
      72187506
    • Andrei Matei's avatar
      bpf: Simplify checking size of helper accesses · 8a021e7f
      Andrei Matei authored
      This patch simplifies the verification of size arguments associated to
      pointer arguments to helpers and kfuncs. Many helpers take a pointer
      argument followed by the size of the memory access performed to be
      performed through that pointer. Before this patch, the handling of the
      size argument in check_mem_size_reg() was confusing and wasteful: if the
      size register's lower bound was 0, then the verification was done twice:
      once considering the size of the access to be the lower-bound of the
      respective argument, and once considering the upper bound (even if the
      two are the same). The upper bound checking is a super-set of the
      lower-bound checking(*), except: the only point of the lower-bound check
      is to handle the case where zero-sized-accesses are explicitly not
      allowed and the lower-bound is zero. This static condition is now
      checked explicitly, replacing a much more complex, expensive and
      confusing verification call to check_helper_mem_access().
      
      Error messages change in this patch. Before, messages about illegal
      zero-size accesses depended on the type of the pointer and on other
      conditions, and sometimes the message was plain wrong: in some tests
      that changed you'll see that the old message was something like "R1 min
      value is outside of the allowed memory range", where R1 is the pointer
      register; the error was wrongly claiming that the pointer was bad
      instead of the size being bad. Other times the information that the size
      came for a register with a possible range of values was wrong, and the
      error presented the size as a fixed zero. Now the errors refer to the
      right register. However, the old error messages did contain useful
      information about the pointer register which is now lost; recovering
      this information was deemed not important enough.
      
      (*) Besides standing to reason that the checks for a bigger size access
      are a super-set of the checks for a smaller size access, I have also
      mechanically verified this by reading the code for all types of
      pointers. I could convince myself that it's true for all but
      PTR_TO_BTF_ID (check_ptr_to_btf_access). There, simply looking
      line-by-line does not immediately prove what we want. If anyone has any
      qualms, let me know.
      Signed-off-by: default avatarAndrei Matei <andreimatei1@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231221232225.568730-2-andreimatei1@gmail.com
      8a021e7f
    • Lin Ma's avatar
      net/sched: cls_api: complement tcf_tfilter_dump_policy · 2ab1efad
      Lin Ma authored
      In function `tc_dump_tfilter`, the attributes array is parsed via
      tcf_tfilter_dump_policy which only describes TCA_DUMP_FLAGS. However,
      the NLA TCA_CHAIN is also accessed with `nla_get_u32`.
      
      The access to TCA_CHAIN is introduced in commit 5bc17018 ("net:
      sched: introduce multichain support for filters") and no nla_policy is
      provided for parsing at that point. Later on, tcf_tfilter_dump_policy is
      introduced in commit f8ab1807 ("net: sched: introduce terse dump
      flag") while still ignoring the fact that TCA_CHAIN needs a check. This
      patch does that by complementing the policy to allow the access
      discussed here can be safe as other cases just choose rtm_tca_policy as
      the parsing policy.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ab1efad
    • liyouhong's avatar
    • Gerhard Engleder's avatar
      net: ethtool: Fix symmetric-xor RSS RX flow hash check · 501869fe
      Gerhard Engleder authored
      Commit 13e59344 ("net: ethtool: add support for symmetric-xor RSS hash")
      adds a check to the ethtool set_rxnfc operation, which checks the RX
      flow hash if the flag RXH_XFRM_SYM_XOR is set. This flag is introduced
      with the same commit. It calls the ethtool get_rxfh operation to get the
      RX flow hash data. If get_rxfh is not supported, then EOPNOTSUPP is
      returned.
      
      There are driver like tsnep, macb, asp2, genet, gianfar, mtk, ... which
      support the ethtool operation set_rxnfc but not get_rxfh. This results
      in EOPNOTSUPP returned by ethtool_set_rxnfc() without actually calling
      the ethtool operation set_rxnfc. Thus, set_rxnfc got broken for all
      these drivers.
      
      Check RX flow hash in ethtool_set_rxnfc() only if driver supports RX
      flow hash.
      
      Fixes: 13e59344 ("net: ethtool: add support for symmetric-xor RSS hash")
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Reviewed-by: default avatarRavi Gunasekaran <r-gunasekaran@ti.com>
      Link: https://lore.kernel.org/r/20231226205536.32003-1-gerhard@engleder-embedded.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      501869fe
    • Jakub Kicinski's avatar
      Merge branch 'bug-fixes-for-rss-symmetric-xor' · 88b8fd97
      Jakub Kicinski authored
      Ahmed Zaki says:
      
      ====================
      Bug fixes for RSS symmetric-xor
      
      A couple of fixes for the symmetric-xor recently merged in net-next [1].
      
      The first patch copies the xfrm value back to user-space when ethtool is
      built with --disable-netlink. The second allows ethtool to change other
      RSS attributes while not changing the xfrm values.
      
      Link: https://lore.kernel.org/netdev/20231213003321.605376-1-ahmed.zaki@intel.com/ [1]
      ====================
      
      Link: https://lore.kernel.org/r/20231221184235.9192-1-ahmed.zaki@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      88b8fd97
    • Ahmed Zaki's avatar
      net: ethtool: add a NO_CHANGE uAPI for new RXFH's input_xfrm · 0dd415d1
      Ahmed Zaki authored
      Add a NO_CHANGE uAPI value for the new RXFH/RSS input_xfrm uAPI field.
      This needed so that user-space can set other RSS values (hkey or indir
      table) without affecting input_xfrm.
      
      Should have been part of [1].
      
      Link: https://lore.kernel.org/netdev/20231213003321.605376-1-ahmed.zaki@intel.com/ [1]
      Fixes: 13e59344 ("net: ethtool: add support for symmetric-xor RSS hash")
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Link: https://lore.kernel.org/r/20231221184235.9192-3-ahmed.zaki@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0dd415d1
    • Ahmed Zaki's avatar
      net: ethtool: copy input_xfrm to user-space in ethtool_get_rxfh · 7c402f77
      Ahmed Zaki authored
      The ioctl path of ethtool's get channels is missing the final step of
      copying the new input_xfrm field to user-space. This should have been
      part of [1].
      
      Link: https://lore.kernel.org/netdev/20231213003321.605376-1-ahmed.zaki@intel.com/ [1]
      Fixes: 13e59344 ("net: ethtool: add support for symmetric-xor RSS hash")
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Link: https://lore.kernel.org/r/20231221184235.9192-2-ahmed.zaki@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7c402f77
  3. 02 Jan, 2024 5 commits