- 28 May, 2021 12 commits
-
-
Florian Westphal authored
This allows to change storage placement later on without changing readers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Reduce size from 28 to 24 bytes on 32bit platforms. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
The fragment offset in ipv4/ipv6 is a 16bit field, so use u16 instead of unsigned int. On 64bit: 40 bytes to 32 bytes. By extension this also reduces nft_pktinfo (56 to 48 byte). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Replace netlink_unicast() calls by nfnetlink_unicast() which already deals with translating EAGAIN to ENOBUFS as the nfnetlink core expects. nfnetlink_unicast() calls nlmsg_unicast() which returns zero in case of success, otherwise the netlink core function netlink_rcv_skb() turns err > 0 into an acknowlegment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Yang Li authored
Variable 'ret' is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed Clean up the following clang-analyzer warning: net/netfilter/xt_CT.c:175:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jason Baron authored
We've seen this spin_lock show up high in profiles. Let's introduce a lockless version. I've tested this using pktgen_sample01_simple.sh. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Juerg Haefliger authored
Remove leading spaces before tabs in Kconfig file(s) by running the following command: $ find net/netfilter -name 'Kconfig*' | xargs sed -r -i 's/^[ ]+\t/\t/' Signed-off-by: Juerg Haefliger <juergh@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Extend nft_set_do_lookup() to use direct calls when retpoline feature is enabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Followup patch will add a CONFIG_RETPOLINE wrapper to avoid the ops->lookup() indirection cost for retpoline builds. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Stefano Brivio authored
We don't need a valid MXCSR state for the lookup routines, none of the instructions we use rely on or affect any bit in the MXCSR register. Instead of calling kernel_fpu_begin(), we can pass 0 as mask to kernel_fpu_begin_mask() and spare one LDMXCSR instruction. Commit 49200d17 ("x86/fpu/64: Don't FNINIT in kernel_fpu_begin()") already speeds up lookups considerably, and by dropping the MCXSR initialisation we can now get a much smaller, but measurable, increase in matching rates. The table below reports matching rates and a wild approximation of clock cycles needed for a match in a "port,net" test with 10 entries from selftests/netfilter/nft_concat_range.sh, limited to the first field, i.e. the port (with nft_set_rbtree initialisation skipped), run on a single AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$). The (very rough) estimation of clock cycles is obtained by simply dividing frequency by matching rate. The "cycles spared" column refers to the difference in cycles compared to the previous row, and the rate increase also refers to the previous row. Results are averages of six runs. Merely for context, I'm also reporting packet rates obtained by skipping kernel_fpu_begin() and kernel_fpu_end() altogether (which shows a very limited impact now), as well as skipping the whole lookup function, compared to simply counting and dropping all packets using the netdev hook drop (see nft_concat_range.sh for details). This workload also includes packet generation with pktgen and the receive path of veth. |matching| est. | cycles | rate | | rate | cycles | spared |increase| | (Mpps) | | | | --------------------------------------|--------|--------|--------|--------| FNINIT, LDMXCSR (before 49200d17) | 5.245 | 553 | - | - | LDMXCSR only (with 49200d17) | 6.347 | 457 | 96 | 21.0% | Without LDMXCSR (this patch) | 6.461 | 449 | 8 | 1.8% | -------- for reference only: ---------|--------|--------|--------|--------| Without kernel_fpu_begin() | 6.513 | 445 | 4 | 0.8% | Without actual matching (return true) | 7.649 | 379 | 66 | 17.4% | Without lookup operation (netdev drop)| 10.320 | 281 | 98 | 34.9% | The clock cycles spared by avoiding LDMXCSR appear to be in line with CPI and latency indicated in the manuals of comparable architectures: Intel Skylake (CPI: 1, latency: 7) and AMD 12h (latency: 12) -- I couldn't find this information for AMD 17h. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Phil Sutter authored
Chunks are SCTP header extensions similar in implementation to IPv6 extension headers or TCP options. Reusing exthdr expression to find and extract field values from them is therefore pretty straightforward. For now, this supports extracting data from chunks at a fixed offset (and length) only - chunks themselves are an extensible data structure; in order to make all fields available, a nested extension search is needed. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5-updates-2021-05-26 Misc update for mlx5 driver, 1) Clean up patches for lag and SF 2) Reserve bit 31 in steering register C1 for IPSec offload usage 3) Move steering tables pool logic into the steering core and increase the maximum table size to 2G entries when software steering is enabled. * tag 'mlx5-updates-2021-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Fix lag port remapping logic net/mlx5: Use boolean arithmetic to evaluate roce_lag net/mlx5: Remove unnecessary spin lock protection net/mlx5: Cap the maximum flow group size to 16M entries net/mlx5: DR, Set max table size to 2G entries net/mlx5: Move chains ft pool to be used by all firmware steering net/mlx5: Move table size calculation to steering cmd layer net/mlx5: Add case for FS_FT_NIC_TX FT in MLX5_CAP_FLOWTABLE_TYPE net/mlx5: DR, Remove unused field of send_ring struct net/mlx5e: RX, Remove unnecessary check in RX CQE compression handling net/mlx5e: IPsec/rep_tc: Fix rep_tc_update_skb drops IPsec packet net/mlx5e: TC: Reserved bit 31 of REG_C1 for IPsec offload net/mlx5e: TC: Use bit counts for register mapping net/mlx5: CT: Avoid reusing modify header context for natted entries net/mlx5e: CT, Remove newline from ct_dbg call ==================== Link: https://lore.kernel.org/r/20210527185624.694304-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 27 May, 2021 28 commits
-
-
Gustavo A. R. Silva authored
Replace union with a couple of pointers in order to fix the following out-of-bounds warning: CC [M] drivers/net/ethernet/intel/ixgbe/ixgbe_common.o drivers/net/ethernet/intel/ixgbe/ixgbe_common.c: In function ‘ixgbe_host_interface_command’: drivers/net/ethernet/intel/ixgbe/ixgbe_common.c:3729:13: warning: array subscript 1 is above array bounds of ‘u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds] 3729 | bp->u32arr[bi] = IXGBE_READ_REG_ARRAY(hw, IXGBE_FLEX_MNG, bi); | ~~~~~~~~~~^~~~ drivers/net/ethernet/intel/ixgbe/ixgbe_common.c:3682:7: note: while referencing ‘u32arr’ 3682 | u32 u32arr[1]; | ^~~~~~ This helps with the ongoing efforts to globally enable -Warray-bounds. Link: https://github.com/KSPP/linux/issues/109Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20210527173424.362456-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Gatis Peisenieks says: ==================== add 4 RX/TX queue support for Mikrotik 10/25G NIC More RX/TX queues on a network card help spread the CPU load among cores and achieve higher overall networking performance. This patch set adds support for 4 RX/TX queues available on Mikrotik 10/25G NIC. v4: - addressed comments from Jakub Kicinski: - split up the change in more manageable chunks - changed member order in structs for tighter packing - fixed style issues - reverted to calling napi_alloc_skb only from within poll as before v3: - fix kernel-doc complaints on comments as pointed out by David Miller v2: - rebase on net-next master as requested by David Miller ==================== Link: https://lore.kernel.org/r/20210527144423.3395719-1-gatis@mikrotik.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gatis Peisenieks authored
More RX/TX queues on a network card help spread the CPU load among cores and achieve higher overall networking performance. The new Mikrotik 10/25G NIC supports 4 RX and 4 TX queues. TX queues are treated with equal priority. RX queue balancing is fixed based on L2/L3/L4 hash. This adds support for 4 RX/TX queues while maintaining backwards compatibility with older hardware. Simultaneous TX + RX performance on AMD Threadripper 3960X with Mikrotik 10/25G NIC improved from 1.6Mpps to 3.2Mpps per port. Backwards compatiblitiy was verified with AR8151 and AR8131 based NICs. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gatis Peisenieks authored
Move napi and other per queue members into per rx queue struct. Allocate max rx queues that any hw supported by the driver might have. Patch that actually enables multiple rx queues will follow. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gatis Peisenieks authored
To get more performance from using multiple tx queues one needs a per tx queue napi. Move tx napi from per adapter struct into per tx queue struct. Patch that actually enables multiple tx queues will follow. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gatis Peisenieks authored
To support NICs that allow for more than one tx queue it is required to detect NIC type early during probe. This is moves NIC type detection before netdev_alloc to prepare for that. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jiri Pirko says: ==================== mlx*: devlink dev info versions adjustments Couple of adjustments in Mellanox drivers regarding devlink dev versions fill-up. ==================== Link: https://lore.kernel.org/r/20210526104509.761807-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Instead of having the string spelled out in the driver, use the global define with the same value. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
To be aligned with the rest of the drivers, expose FW version under "fw" keyword in devlink dev info, in addition to the existing "fw.version", which is currently Mellanox-specific. devlink output before: running: fw.version 30.2008.2018 after: running: fw.version 30.2008.2018 fw 30.2008.2018 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
To be aligned with the rest of the drivers, expose FW version under "fw" keyword in devlink dev info, in addition to the existing "fw.version", which is currently Mellanox-specific. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
If user passes devlink handle over DEVLINK_DEV variable, check if the device exists. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20210527105515.790330-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Merge tag 'linux-can-next-for-5.14-20210527' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== can-next 2021-05-27 The first 2 patches are by Geert Uytterhoeven and convert the rcan_can and rcan_canfd device tree bindings to yaml. The next 2 patches are by Oliver Hartkopp and me and update the CAN uapi headers. zuoqilin's patch removes an unnecessary variable from the CAN proc code. Patrick Menschel contributes 3 patches for CAN ISOTP to enhance the error messages. Jiapeng Chong's patch removes two dead stores from the softing driver. The next 4 patches are by me and silence several warnings found by clang compiler. Jimmy Assarsson's patches for the kvaser_usb driver add support for the Kvaser hydra devices. Dario Binacchi provides 2 patches for the c_can driver, first removing an unused variable, then adding basic ethtool support to query driver and ring parameter info. The last 4 patches are by Torin Cooper-Bennun and clean up the m_can driver. * tag 'linux-can-next-for-5.14-20210527' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (21 commits) can: m_can: fix whitespace in a few comments can: m_can: make TXESC, RXESC config more explicit can: m_can: clean up CCCR reg defs, order by revs can: m_can: use bits.h macros for all regmasks can: c_can: add ethtool support can: c_can: remove unused variable struct c_can_priv::rxmasked can: kvaser_usb: Add new Kvaser hydra devices can: kvaser_usb: Rename define USB_HYBRID_{,PRO_}CANLIN_PRODUCT_ID can: at91_can: silence clang warning can: mcp251xfd: silence clang warning can: mcp251x: mcp251x_can_probe(): silence clang warning can: hi311x: hi3110_can_probe(): silence clang warning can: softing: Remove redundant variable ptr can: isotp: Add error message if txqueuelen is too small can: isotp: add symbolic error message to isotp_module_init() can: isotp: change error format from decimal to symbolic error names can: proc: remove unnecessary variables can: uapi: introduce CANFD_FDF flag for mixed content in struct canfd_frame can: uapi: update CAN-FD frame description dt-bindings: can: rcar_canfd: Convert to json-schema ... ==================== Link: https://lore.kernel.org/r/20210527084532.1384031-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Instead of doing sprintf twice in case the port is split or not, append the split port suffix in case the port is split. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20210527104819.789840-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eli Cohen authored
Fix the logic so that if both ports netdevices are enabled or disabled, use the trivial mapping without swapping. If only one of the netdevice's tx is enabled, use it to remap traffic to that port. Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Eli Cohen authored
Avoid mixing boolean and bit arithmetic when evaluating validity of roce_lag. Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Eli Cohen authored
Taking lag_lock to access ldev->tracker is meaningless in the context of do_bond() and mlx5_lag_netdev_event(). Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
The maximum number of large flow groups applies to both small and large tables. For very large tables (such as the 2G SW steering tables) this may create a small number of flow groups each with an unrealistic entries domain (> 16M). Set the maximum number of large flow groups to at least what user requested, but with a maximum per group size of 16M entries. For software steering, if user requested less than 128 large flow groups, it will gives us about 128 16M groups in a 2G entries tables. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
SW steering has no table size limitations. However, fs_core API is size aware. Set SW steering tables to the maximum possible table size (INT_MAX). Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
Firmware FT pool is per device, but the software tracking of this pool only services fs_chains users, and if another layer takes a flow table, the pool will not be updated, and fs_chains will fail creating a flow table, with no recovery till the flow table is returned. Move FT pool to be global per device, and stored at the cmd level, so all layers can use it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
Currently the table size is calculated by the fs_core layer. However, each steering cmd instance has a different allocation logic. FW steering uses a predefined pools of multiple sizes. SW steering doesn't have a pool, and can allocate any size of tables. Move the table size calculation to the steering cmd layer as a pre-step for moving fs_chains pool logic globally to firmware steering, and increasing table sizes for software steering. In addition, change the size parameter to absolute size to allow the special zero value to mean "get next available maximum size". Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
Commit 16f1c5bb ("net/mlx5: Check device capability for maximum flow counters") added MLX5_CAP_FLOWTABLE_TYPE but forgot to account for FS_FT_NIC_TX case in the expression. Although the expression will return 1 for this case instead of the actual cap, there isn't currently no known side affects of missing this case. Add the FS_FT_NIC_TX case. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Remove unused field of struct mlx5dr_send_ring Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Tariq Toukan authored
There are two reasons for exiting mlx5e_decompress_cqes_cont(): 1. The compression session is completed (cqd.left == 0). 2. The budget is exhausted (work_done == budget). If after calling mlx5e_decompress_cqes_cont() we have cqd.left > 0, it necessarily implies that budget is exhausted. The first part of the complex condition is covered by the second, hence we remove it here. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Huy Nguyen authored
rep_tc copy REG_C1 to REG_B. IPsec crypto utilizes the whole REG_B register with BIT31 as IPsec marker. rep_tc_update_skb drops IPsec because it thought REG_B contains bad value. In previous patch, BIT 31 of REG_C1 is reserved for IPsec. Skip the rep_tc_update_skb if BIT31 of REG_B is set. Signed-off-by: Huy Nguyen <huyn@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Huy Nguyen authored
Currently ASAP features fully utilize all the bits of the CQE's flow tag and ft_metadata field. The flow tag field cannot be used because the flow table tagging in FTE does not allow partial write. We agree to reserve bit 31 of CQE's ft_metadata for IPsec to avoid ASAP CT from dropping IPsec offloaded packet Here is the new bit layout of REG_C1. Tunnel option id is reduced to 11 bits: < IPSEC MARKER (1) | ESW_TUN_ID(12) | ESW_TUN_OPTS(11) | ESW_ZONE_ID(8) > Signed-off-by: Huy Nguyen <huyn@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com>
-
Paul Blakey authored
To prepare for next patch where we will use a non-byte aligned mapping, change all byte counts in register mapping to bits. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
Currently the driver is designed to reuse header modify context entries. Natted entries will always have a unique modify header, as such the modify header hashtable lookup is introducing an overhead. When the hashtable size exceeded 200k entries the tested insertion rate dropped from ~10k entries/sec to ~300 entries/sec. Don't use the re-use mechanism when creating modify headers for natted tuples. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
ct_dbg() already adds a newline. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-