1. 28 May, 2021 5 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: prefer direct calls for set lookups · f227925e
      Florian Westphal authored
      Extend nft_set_do_lookup() to use direct calls when retpoline feature
      is enabled.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f227925e
    • Florian Westphal's avatar
      netfilter: add and use nft_set_do_lookup helper · 0974cff3
      Florian Westphal authored
      Followup patch will add a CONFIG_RETPOLINE wrapper to avoid
      the ops->lookup() indirection cost for retpoline builds.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0974cff3
    • Stefano Brivio's avatar
      netfilter: nft_set_pipapo_avx2: Skip LDMXCSR, we don't need a valid MXCSR state · a58db7ad
      Stefano Brivio authored
      We don't need a valid MXCSR state for the lookup routines, none of
      the instructions we use rely on or affect any bit in the MXCSR
      register.
      
      Instead of calling kernel_fpu_begin(), we can pass 0 as mask to
      kernel_fpu_begin_mask() and spare one LDMXCSR instruction.
      
      Commit 49200d17 ("x86/fpu/64: Don't FNINIT in kernel_fpu_begin()")
      already speeds up lookups considerably, and by dropping the MCXSR
      initialisation we can now get a much smaller, but measurable, increase
      in matching rates.
      
      The table below reports matching rates and a wild approximation of
      clock cycles needed for a match in a "port,net" test with 10 entries
      from selftests/netfilter/nft_concat_range.sh, limited to the first
      field, i.e. the port (with nft_set_rbtree initialisation skipped), run
      on a single AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$).
      
      The (very rough) estimation of clock cycles is obtained by simply
      dividing frequency by matching rate. The "cycles spared" column refers
      to the difference in cycles compared to the previous row, and the rate
      increase also refers to the previous row. Results are averages of six
      runs.
      
      Merely for context, I'm also reporting packet rates obtained by
      skipping kernel_fpu_begin() and kernel_fpu_end() altogether (which
      shows a very limited impact now), as well as skipping the whole lookup
      function, compared to simply counting and dropping all packets using
      the netdev hook drop (see nft_concat_range.sh for details). This
      workload also includes packet generation with pktgen and the receive
      path of veth.
      
                                            |matching|  est.  | cycles |  rate  |
                                            |  rate  | cycles | spared |increase|
                                            | (Mpps) |        |        |        |
      --------------------------------------|--------|--------|--------|--------|
      FNINIT, LDMXCSR (before 49200d17) |  5.245 |    553 |      - |      - |
      LDMXCSR only (with 49200d17)      |  6.347 |    457 |     96 |  21.0% |
      Without LDMXCSR (this patch)          |  6.461 |    449 |      8 |   1.8% |
      -------- for reference only: ---------|--------|--------|--------|--------|
      Without kernel_fpu_begin()            |  6.513 |    445 |      4 |   0.8% |
      Without actual matching (return true) |  7.649 |    379 |     66 |  17.4% |
      Without lookup operation (netdev drop)| 10.320 |    281 |     98 |  34.9% |
      
      The clock cycles spared by avoiding LDMXCSR appear to be in line with CPI
      and latency indicated in the manuals of comparable architectures: Intel
      Skylake (CPI: 1, latency: 7) and AMD 12h (latency: 12) -- I couldn't find
      this information for AMD 17h.
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a58db7ad
    • Phil Sutter's avatar
      netfilter: nft_exthdr: Support SCTP chunks · 133dc203
      Phil Sutter authored
      Chunks are SCTP header extensions similar in implementation to IPv6
      extension headers or TCP options. Reusing exthdr expression to find and
      extract field values from them is therefore pretty straightforward.
      
      For now, this supports extracting data from chunks at a fixed offset
      (and length) only - chunks themselves are an extensible data structure;
      in order to make all fields available, a nested extension search is
      needed.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      133dc203
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-updates-2021-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · af9207ad
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2021-05-26
      
      Misc update for mlx5 driver,
      
      1) Clean up patches for lag and SF
      
      2) Reserve bit 31 in steering register C1 for IPSec offload usage
      
      3) Move steering tables pool logic into the steering core and
        increase the maximum table size to 2G entries when software steering
        is enabled.
      
      * tag 'mlx5-updates-2021-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: Fix lag port remapping logic
        net/mlx5: Use boolean arithmetic to evaluate roce_lag
        net/mlx5: Remove unnecessary spin lock protection
        net/mlx5: Cap the maximum flow group size to 16M entries
        net/mlx5: DR, Set max table size to 2G entries
        net/mlx5: Move chains ft pool to be used by all firmware steering
        net/mlx5: Move table size calculation to steering cmd layer
        net/mlx5: Add case for FS_FT_NIC_TX FT in MLX5_CAP_FLOWTABLE_TYPE
        net/mlx5: DR, Remove unused field of send_ring struct
        net/mlx5e: RX, Remove unnecessary check in RX CQE compression handling
        net/mlx5e: IPsec/rep_tc: Fix rep_tc_update_skb drops IPsec packet
        net/mlx5e: TC: Reserved bit 31 of REG_C1 for IPsec offload
        net/mlx5e: TC: Use bit counts for register mapping
        net/mlx5: CT: Avoid reusing modify header context for natted entries
        net/mlx5e: CT, Remove newline from ct_dbg call
      ====================
      
      Link: https://lore.kernel.org/r/20210527185624.694304-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af9207ad
  2. 27 May, 2021 35 commits