- 28 Sep, 2022 22 commits
-
-
Kees Cook authored
To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), split the memcpy() of the header and the payload so no false positive run-time overflow warning will be generated. [1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org Cc: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20220927004033.1942992-1-keescook@chromium.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: generalized register definitions This series is quite a bit bigger than what I normally like to send, and I apologize for that. I would like it to get incorporated in its entirety this week if possible, and splitting up the series carries a small risk that wouldn't happen. Each IPA register has a defined offset, and in most cases, a set of masks that define the width and position of fields within the register. Most registers currently use the same offset for all versions of IPA. Usually fields within registers are also the same across many versions. Offsets and fields like this are defined using preprocessor constants. When a register has a different offset for different versions of IPA, an inline function is used to determine its offset. And in places where a field differs between versions, an inline function is used to determine how a value is encoded within the field, depending on IPA version. Starting with IPA version 5.0, the number of IPA endpoints supported is greater than 32. As a consequence, *many* IPA register offsets differ considerably from prior versions. This increase in endpoints also requires a lot of field sizes and/or positions to change (such as those that contain an endpoint ID). Defining these things with constants is no longer simple, and rather than fill the code with one-off functions to define offsets and encode field values, this series puts in place a new way of defining IPA registers and their fields. Note that this series creates this new scheme, but does not add IPA v5.0+ support. An enumerated type will now define a unique ID for each IPA register. Each defined register will have a structure that contains its offset and its name (a printable string). Each version of IPA will have an array of these register structures, indexed by register ID. Some "parameterized" registers are duplicated (this is not new). For example, each endpoint has an INIT_HDR register, and the offset of a given endpoint's INIT_HDR register is dependent on the endpoint number (the parameter). In such cases, the register's "stride" is defined as the distance between two of these registers. If a register contains fields, each field will have a unique ID that's used as an index into an array of field masks defined for the register. The register structure also defines the number of entries in this field array. When a register is to be used in code, its register structure will be fetched using function ipa_reg(). Other functions are then used to determine the register's offset, or to encode a value into one of the register's fields, and so on. Each version of IPA defines the set of registers that are available, including all fields for these registers. The array of defined registers is set up at probe time based on the IPA version, and it is associated with the main IPA structure. ==================== Link: https://lore.kernel.org/r/20220926220931.3261749-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define the fields for the ENDP_INIT_DEAGGR, ENDP_INIT_RSRC_GRP, ENDP_INIT_SEQ, ENDP_STATUS, and ENDP_FILTER_ROUTER_HSH_CFG, and IPA_IRQ_UC IPA registers for all supported IPA versions. Create enumerated types to identify fields for these IPA registers. Use IPA_REG_FIELDS() and IPA_REG_STRIDE_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Use ipa_reg_encode() and ipa_reg_bit() to build up the values to be written to these registers, remove an inline function and all the *_FMASK symbols that are now no longer used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define the fields for the ENDP_INIT_MODE, ENDP_INIT_AGGR, ENDP_INIT_HOL_BLOCK_EN, and ENDP_INIT_HOL_BLOCK_TIMER IPA registers for all supported IPA versions. Create enumerated types to identify fields for these IPA registers. Use IPA_REG_STRIDE_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Change aggr_time_limit_encode() and hol_block_timer_encode() so they take an ipa_reg pointer, and use those register's fields to compute their encoded results. Have aggr_time_limit_encode() take an IPA pointer rather than version, to match hol_block_timer_encode(). Use ipa_reg_encode(), ipa_reg_bit(), and ipa_reg_field_max() to manipulate values to be written to these registers, remove the definitions of the various inline functions and *_FMASK symbols that are now no longer used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define the fields for the ENDP_INIT_CTRL, ENDP_INIT_CFG, ENDP_INIT_NAT, ENDP_INIT_HDR, and ENDP_INIT_HDR_EXT IPA registers for all supported IPA versions. Create enumerated types to identify fields for these IPA registers. Use IPA_REG_STRIDE_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Move ipa_header_size_encoded() and ipa_metadata_offset_encoded() out of "ipa_reg.h" and into "ipa_endpoint.c". Change them so they take an additional ipa_reg structure argument, and use ipa_reg_encode() to encode the parts of the header size and offset prior to writing to the register. Change their names to be verbs rather than nouns. Use ipa_reg_encode(), ipa_reg_bit, and ipa_reg_field_max() to manipulate values to be written to these registers, remove the definition of the no-longer-used *_FMASK symbols. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define the fields for the {SRC,DST}_RSRC_GRP_{01,23,45,67}_RSRC_TYPE IPA registers for all supported IPA versions. Create enumerated types to identify fields for these IPA registers. Use IPA_REG_STRIDE_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Use ipa_reg_encode() to build up the values to be written to these registers. Remove the definition of the no-longer-used *_FMASK symbols. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define the fields for the FLAVOR_0, IDLE_INDICATION_CFG, QTIME_TIMESTAMP_CFG, TIMERS_XO_CLK_DIV_CFG and TIMERS_PULSE_GRAN_CFG IPA registers for all supported IPA versions. Create enumerated types to identify fields for these IPA registers. Use IPA_REG_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be written to these registers. Use ipa_reg_decode() to extract field values from the FLAVOR_0 register. Remove the definition of the no-longer-used *_FMASK symbols. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define the fields for the LOCAL_PKT_PROC_CNTXT, COUNTER_CFG, and IPA_TX_CFG IPA registers for all supported IPA versions. Create enumerated types to identify fields for these IPA registers. Use IPA_REG_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be written to these registers. Remove the definition of the *_FMASK symbols as well as proc_cntxt_base_addr_encoded(), because they are no longer needed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define the fields for the SHARED_MEM_SIZE, QSB_MAX_WRITES, QSB_MAX_READS, FILT_ROUT_HASH_EN, and FILT_ROUT_HASH_FLUSH IPA registers for all supported IPA versions. Create enumerated types to identify fields for these registers. Use IPA_REG_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be written to these registers rather than using the *_FMASK preprocessor symbols. Remove the definition of the now unused *_FMASK symbols. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create the ipa_reg_clkon_cfg_field_id enumerated type, which identifies the fields for the CLKON_CFG IPA register. Add "CLKON_" to a few short names to try to avoid name conflicts. Create the ipa_reg_route_field_id enumerated type, which identifies the fields for the ROUTE IPA register. Use IPA_REG_FIELDS() to specify the field mask values defined for these registers, for each supported version of IPA. Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be written to these registers rather than using the *_FMASK preprocessor symbols. Remove the definition of the now unused *_FMASK symbols. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create the ipa_reg_comp_cfg_field_id enumerated type, which identifies the fields for the COMP_CFG IPA register. Use IPA_REG_FIELDS() to specify the field mask values defined for this register, for each supported version of IPA. Use ipa_reg_bit() to build up the value to be written to this register rather than using the *_FMASK preprocessor symbols. Remove the definition of the *_FMASK symbols, along with the inline functions that were used to encode certain fields whose position and/or width within the register was dependent on IPA version. Take this opportunity to represent all one-bit fields using BIT(x) rather than GENMASK(x, x). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Add register field descriptors to the ipa_reg structure. A field in a register is defined by a field mask, which is a 32-bit mask having a single contiguous range of bits set. For each register that has at least one field defined, an enumerated type will identify the register's fields. The ipa_reg structure for that register will include an array fmask[] of field masks, indexed by that enumerated type. Each field mask defines the position and bit width of a field. An additional "fcount" records how many fields (masks) are defined for a given register. Introduce two macros to be used to define registers that have at least one field. Introduce a few new functions related to field masks. The first simply returns a field mask, given an IPA register pointer and field mask ID. A variant of that is meant to be used for the special case of single-bit field masks. Next, ipa_reg_encode(), identifies a field with an IPA register pointer and a field ID, and takes a value to represent in that field. The result encodes the value in the appropriate place to be stored in the register. This is roughly modeled after the bitmask operations (like u32_encode_bits()). Another function (ipa_reg_decode()) similarly identifies a register field, but the value supplied to it represents a full register value. The value encoded in the field is extracted from the value and returned. This is also roughly modeled after bitmask operations (such as u32_get_bits()). Finally, ipa_reg_field_max() returns the maximum value representable by a field. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create a new function that returns a register descriptor given its ID. Change ipa_reg_offset() and ipa_reg_n_offset() so they take a register descriptor argument rather than an IPA pointer and register ID. Have them accept null pointers (and return an invalid 0 offset), to avoid the need for excessive error checking. (A warning is issued whenever ipa_reg() returns 0). Call ipa_reg() or ipa_reg_n() to look up information about the register before calls to ipa_reg_offset() and ipa_reg_n_offset(). Delay looking up offsets until they're needed to read or write registers. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Use the array of register descriptors assigned at initialization time to determine the offset (and where used, stride) for IPA registers. Issue a warning if an offset is requested for a register that's not valid for the current system. Remove all IPE_REG_*_OFFSET macros, as well as inline static functions that returned register offsets. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create a new subdirectory "reg", which contains a register definition file for each supported version of IPA. Each register definition contains the register's offset, and for parameterized registers, the stride (distance between consecutive instances of the register). Finally, it includes an all-caps printable register name. In these files, each IPA version defines an array of IPA register definition pointers, with unsupported registers defined with a null pointer. The array is indexed by the ipa_reg_id enumerated type. At initialization time, the appropriate register definition array to use is selected based on the IPA version, and assigned to a new "regs" field in the IPA structure. Extend ipa_reg_valid() so it fails if a valid register is not defined. This patch simply puts this infrastructure in place; the next will use it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Expose two inline functions that return the offset for a register whose ID is provided; one of them takes an additional argument that's used for registers that are parameterized. These both use a common helper function __ipa_reg_offset(), which just uses the offset symbols already defined. Replace all references to the offset macros defined for IPA registers with calls to ipa_reg_offset() or ipa_reg_n_offset(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create a new ipa_reg_id enumerated type, which identifies each IPA register with a symbolic identifier. Use short names, but in some cases (such as "BCR") add "IPA_" to the name to help avoid name conflicts. Create two functions that indicate register validity. The first concisely indicates whether a register is valid for a given version of IPA, and if so, whether it is defined. The second indicates whether a register is valid for TX or RX endpoints. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kees Cook authored
To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), split the memcpy() of the header and the payload so no false positive run-time overflow warning will be generated. [1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org/ Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20220927003953.1942442-1-keescook@chromium.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Donald Hunter authored
Enumerate the skb drop reasons in the receive path for IPv6 UDP packets. Signed-off-by: Donald Hunter <donald.hunter@redhat.com> Link: https://lore.kernel.org/r/20220926120350.14928-1-donald.hunter@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Bo Liu authored
Use ida_alloc_xxx()/ida_free() instead of ida_simple_get()/ida_simple_remove(). The latter is deprecated and more verbose. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220926012744.3363-1-liubo03@inspur.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Taehee Yoo authored
RFC 6209 describes ARIA for TLS 1.2. ARIA-128-GCM and ARIA-256-GCM are defined in RFC 6209. This patch would offer performance increment and an opportunity for hardware offload. Benchmark results: iperf-ssl are used. CPU: intel i3-12100. TLS(openssl-3.0-dev) [ 3] 0.0- 1.0 sec 185 MBytes 1.55 Gbits/sec [ 3] 1.0- 2.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 2.0- 3.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 3.0- 4.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 4.0- 5.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 0.0- 5.0 sec 927 MBytes 1.56 Gbits/sec kTLS(aria-generic) [ 3] 0.0- 1.0 sec 198 MBytes 1.66 Gbits/sec [ 3] 1.0- 2.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 2.0- 3.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 3.0- 4.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 4.0- 5.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 0.0- 5.0 sec 974 MBytes 1.63 Gbits/sec kTLS(aria-avx wirh GFNI) [ 3] 0.0- 1.0 sec 632 MBytes 5.30 Gbits/sec [ 3] 1.0- 2.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 2.0- 3.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 3.0- 4.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 4.0- 5.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 0.0- 5.0 sec 3.18 GBytes 5.47 Gbits/sec Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/20220925150033.24615-1-ap420073@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Bhupesh Sharma authored
Minor spell fix related to 'stmmac_clk_csr_set()' inside a comment used in the 'stmmac_probe_config_dt()' function. Cc: Biao Huang <biao.huang@mediatek.com> Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org> Link: https://lore.kernel.org/r/20220924104514.1666947-1-bhupesh.sharma@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 27 Sep, 2022 18 commits
-
-
Pavel Begunkov authored
d8b6171b ("selftests/io_uring: test zerocopy send") added io_uring zerocopy tests but forgot to enable it in make runs. Add missing io_uring_zerocopy_tx.sh into TEST_PROGS. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/28e743602cdd54ffc49f68bbcbcbafc59ba22dc2.1664142210.git.asml.silence@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jiri Pirko says: ==================== devlink: fix order of port and netdev register in drivers Some of the drivers use wrong order in registering devlink port and netdev, registering netdev first. That was not intended as the devlink port is some sort of parent for the netdev. Fix the ordering. Note that the follow-up patchset is going to make this ordering mandatory. ==================== Link: https://lore.kernel.org/r/20220926110938.2800005-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Make sure that devlink port is registered first and register netdev after. Unregister netdev before devlnk port unregister. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Make sure that netdevice is registered/unregistered while devlink port is registered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Fix the order of destroy_netdev() flow and unregister the devlink port after calling unregister_netdev(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
Remove some left-over from commit e2be04c7 ("License cleanup: add SPDX license identifier to uapi header files with a license") When the SPDX-License-Identifier tag has been added, the corresponding license text has not been removed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/88410cddd31197ea26840d7dd71612bece8c6acf.1663871981.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kees Cook authored
To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), split the memcpy() of the header and the payload so no false positive run-time overflow warning will be generated. This split already existed for the "firstfrag" case, so just generalize the logic further. [1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org/ Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220924040835.3364912-1-keescook@chromium.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Golle authored
As sizeof(hwe->data) can now longer be used as the actual size depends on foe_entry_size, in commit 9d8cb4c0 ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") the use of sizeof(hwe->data) is hence replaced. However, replacing it with ppe->eth->soc->foe_entry_size is wrong as foe_entry_size represents the size of the whole descriptor and not just the 'data' field. Fix this by subtracing the size of the only other field in the struct 'ib1', so we actually end up with the correct size to be copied to the data field. Reported-by: Chen Minqiang <ptpt52@gmail.com> Fixes: 9d8cb4c0 ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/YzBqPIgQR2gLrPoK@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Golle authored
In function mtk_foe_entry_set_vlan() the call to field accessor macro FIELD_GET(MTK_FOE_IB1_BIND_VLAN_LAYER, entry->ib1) has been wrongly replaced by mtk_prep_ib1_vlan_layer(eth, entry->ib1) Use correct helper function mtk_get_ib1_vlan_layer instead. Reported-by: Chen Minqiang <ptpt52@gmail.com> Fixes: 03a3180e ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/YzBp+Kk04CFDys4L@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Michael Weiß says: ==================== net: openvswitch: metering and conntrack in userns Currently using openvswitch in a non-initial user namespace, e.g., an unprivileged container, is possible but without metering and conntrack support. This is due to the restriction of the corresponding Netlink interfaces to the global CAP_NET_ADMIN. This simple patches switch from GENL_ADMIN_PERM to GENL_UNS_ADMIN_PERM in several cases to allow this also for the unprivileged container use case. We tested this for unprivileged containers created by the container manager of GyroidOS (gyroidos.github.io). However, for other container managers such as LXC or systemd which provide unprivileged containers this should be apply equally. ==================== Link: https://lore.kernel.org/r/20220923133820.993725-1-michael.weiss@aisec.fraunhofer.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Michael Weiß authored
Similar to the previous commit, the Netlink interface of the OVS conntrack module was restricted to global CAP_NET_ADMIN by using GENL_ADMIN_PERM. This is changed to GENL_UNS_ADMIN_PERM to support unprivileged containers in non-initial user namespace. Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Michael Weiß authored
The Netlink interface for metering was restricted to global CAP_NET_ADMIN by using GENL_ADMIN_PERM. To allow metring in a non-inital user namespace, e.g., a container, this is changed to GENL_UNS_ADMIN_PERM. Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Tony Lu authored
This enables SO_REUSEPORT [1] for clcsock when it is set on smc socket, so that some applications which uses it can be transparently replaced with SMC. Also, this helps improve load distribution. Here is a simple test of NGINX + wrk with SMC. The CPU usage is collected on NGINX (server) side as below. Disable SO_REUSEPORT: 05:15:33 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:15:34 PM all 7.02 0.00 11.86 0.00 2.04 8.93 0.00 0.00 0.00 70.15 05:15:34 PM 0 0.00 0.00 0.00 0.00 16.00 70.00 0.00 0.00 0.00 14.00 05:15:34 PM 1 11.58 0.00 22.11 0.00 0.00 0.00 0.00 0.00 0.00 66.32 05:15:34 PM 2 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 98.00 05:15:34 PM 3 16.84 0.00 30.53 0.00 0.00 0.00 0.00 0.00 0.00 52.63 05:15:34 PM 4 28.72 0.00 44.68 0.00 0.00 0.00 0.00 0.00 0.00 26.60 05:15:34 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:15:34 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:15:34 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Enable SO_REUSEPORT: 05:15:20 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:15:21 PM all 8.56 0.00 14.40 0.00 2.20 9.86 0.00 0.00 0.00 64.98 05:15:21 PM 0 0.00 0.00 4.08 0.00 14.29 76.53 0.00 0.00 0.00 5.10 05:15:21 PM 1 9.09 0.00 16.16 0.00 1.01 0.00 0.00 0.00 0.00 73.74 05:15:21 PM 2 9.38 0.00 16.67 0.00 1.04 0.00 0.00 0.00 0.00 72.92 05:15:21 PM 3 10.42 0.00 17.71 0.00 1.04 0.00 0.00 0.00 0.00 70.83 05:15:21 PM 4 9.57 0.00 15.96 0.00 0.00 0.00 0.00 0.00 0.00 74.47 05:15:21 PM 5 9.18 0.00 15.31 0.00 0.00 1.02 0.00 0.00 0.00 74.49 05:15:21 PM 6 8.60 0.00 15.05 0.00 0.00 0.00 0.00 0.00 0.00 76.34 05:15:21 PM 7 12.37 0.00 14.43 0.00 0.00 0.00 0.00 0.00 0.00 73.20 Using SO_REUSEPORT helps the load distribution of NGINX be more balanced. [1] https://man7.org/linux/man-pages/man7/socket.7.htmlSigned-off-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20220922121906.72406-1-tonylu@linux.alibaba.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jakub Kicinski authored
Sean Anderson says: ==================== net: sunhme: Cleanups and logging improvements This series is a continuation of [1] with a focus on logging improvements (in the style of commit b11e5f6a ("net: sunhme: output link status with a single print.")). I have included several of Rolf's patches in the series where appropriate (with slight modifications). After this series is applied, many more messages from this driver will come with driver/device information. Additionally, most messages (especially debug messages) have been condensed onto one line (as KERN_CONT messages get split). [1] https://lore.kernel.org/netdev/4686583.GXAFRqVoOG@eto.sf-tec.de/ ==================== Link: https://lore.kernel.org/r/20220924015339.1816744-1-seanga2@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sean Anderson authored
I have the hardware so at the very least I can test things. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sean Anderson authored
The SXD, TXD, and RXD macros are used only once (or twice). Just use the vdbg print, which seems to have been devised for these sorts of very verbose messages. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sean Anderson authored
This driver seems to have been written under the assumption that messages can be continued arbitrarily. I'm not when this changed (if ever), but such ad-hoc continuations are liable to be rudely interrupted. Convert all such instances to single prints. This loses a bit of timing information (such as when a line was constructed piecemeal as the function executed), but it's easy to add a few prints if necessary. This also adds newlines to the ends of any prints without them. Since (almost every) debug print included the name of the function, include it automatically. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sean Anderson authored
Wherever possible, use the associated netdev (or device) when printing errors or other messages. This makes it immediately clear what device caused the error, and provides more information than just the device name. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-