- 14 Feb, 2022 13 commits
-
-
Dave Ertman authored
The status of support for RDMA is currently being tracked with two separate status flags. This is unnecessary with the current state of the driver. Simplify status tracking down to a single flag. Rename the helper function to denote the RDMA specific status and universally use the helper function to test the status bit. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Colin Foster says: ==================== use bulk reads for ocelot statistics Ocelot loops over memory regions to gather stats on different ports. These regions are mostly continuous, and are ordered. This patch set uses that information to break the stats reads into regions that can get read in bulk. The motiviation is for general cleanup, but also for SPI. Performing two back-to-back reads on a SPI bus require toggling the CS line, holding, re-toggling the CS line, sending 3 address bytes, sending N padding bytes, then actually performing the read. Bulk reads could reduce almost all of that overhead, but require that the reads are performed via regmap_bulk_read. Verified with eth0 hooked up to the CPU port: NIC statistics: Good Rx Frames: 905 Rx Octets: 78848 Good Tx Frames: 691 Tx Octets: 52516 Rx + Tx 65-127 Octet Frames: 1574 Rx + Tx 128-255 Octet Frames: 22 Net Octets: 131364 Rx DMA chan 0: head_enqueue: 1 Rx DMA chan 0: tail_enqueue: 1032 Rx DMA chan 0: busy_dequeue: 628 Rx DMA chan 0: good_dequeue: 905 Tx DMA chan 0: head_enqueue: 346 Tx DMA chan 0: tail_enqueue: 345 Tx DMA chan 0: misqueued: 345 Tx DMA chan 0: empty_dequeue: 346 Tx DMA chan 0: good_dequeue: 691 p00_rx_octets: 52516 p00_rx_unicast: 691 p00_rx_frames_65_to_127_octets: 691 p00_tx_octets: 78848 p00_tx_unicast: 905 p00_tx_frames_65_to_127_octets: 883 p00_tx_frames_128_255_octets: 22 p00_tx_green_prio_0: 905 And with swp2 connected to swp3 with STP enabled: NIC statistics: tx_packets: 379 tx_bytes: 19708 rx_packets: 1 rx_bytes: 46 rx_octets: 64 rx_multicast: 1 rx_frames_below_65_octets: 1 rx_classified_drops: 1 tx_octets: 44630 tx_multicast: 387 tx_broadcast: 290 tx_frames_below_65_octets: 379 tx_frames_65_to_127_octets: 294 tx_frames_128_255_octets: 4 tx_green_prio_0: 298 tx_green_prio_7: 379 NIC statistics: tx_packets: 1 tx_bytes: 52 rx_packets: 713 rx_bytes: 34148 rx_octets: 46982 rx_multicast: 407 rx_broadcast: 306 rx_frames_below_65_octets: 399 rx_frames_65_to_127_octets: 310 rx_frames_128_to_255_octets: 4 rx_classified_drops: 399 rx_green_prio_0: 314 tx_octets: 64 tx_multicast: 1 tx_frames_below_65_octets: 1 tx_green_prio_7: 1 v1 > v2: reword commit messages v2 > v3: correctly mark this for net-next when sending v3 > v4: calloc array instead of zalloc per review v4 > v5: Apply CR suggestions for whitespace Fix calloc / zalloc mixup Properly destroy workqueues Add third commit to split long macros v5 > v6: Fix functionality - v5 was improperly tested Add bugfix for ethtool mutex lock Remove unnecessary ethtool stats reads v6 > v7: Remove mutex bug patch that was applied via net Rename function based on CR Add missed error check ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Foster authored
Create and utilize bulk regmap reads instead of single access for gathering stats. The background reading of statistics happens frequently, and over a few contiguous memory regions. High speed PCIe buses and MMIO access will probably see negligible performance increase. Lower speed buses like SPI and I2C could see significant performance increase, since the bus configuration and register access times account for a large percentage of data transfer time. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Foster authored
Regmap supports bulk register reads. Ocelot does not. This patch adds support for Ocelot to invoke bulk regmap reads. That will allow any driver that performs consecutive reads over memory regions to optimize that access. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Foster authored
In the ocelot.h file, several read / write macros were split across multiple lines, while others weren't. Split all macros that exceed the 80 character column width and match the style of the rest of the file. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Foster authored
The ocelot_update_stats function only needs to read from one port, yet it was updating the stats for all ports. Update to only read the stats that are necessary. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Add reasons to __udp6_lib_rcv for skb drops. The only twist is that the NO_SOCKET takes precedence over the CSUM or other counters for that path (motivation behind this patch - csum counter was misleading). Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Joseph CHAMG says: ==================== ADD DM9051 ETHERNET DRIVER DM9051 is a spi interface chip, need cs/mosi/miso/clock with an interrupt gpio pin ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joseph CHAMG authored
Add davicom dm9051 spi ethernet driver, The driver work for the device platform which has the spi master Signed-off-by: Joseph CHAMG <josright123@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joseph CHAMG authored
This is a new yaml base data file for configure davicom dm9051 with device tree Signed-off-by: Joseph CHAMG <josright123@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tony Lu authored
The previous patch introduces a lock-free version of smc_tx_work() to solve unnecessary lock contention, which is expected to be held lock. So this adds comment to remind people to keep an eye out for locks. Suggested-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kalash Nainwal authored
Generate RTM_NEWROUTE netlink notification when the route preference changes on an existing kernel generated default route in response to RA messages. Currently netlink notifications are generated only when this route is added or deleted but not when the route preference changes, which can cause userspace routing application state to go out of sync with kernel. Signed-off-by: Kalash Nainwal <kalash@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
in current Linux, MTU policing does not take into account that packets at the TC ingress have the L2 header pulled. Thus, the same TC police action (with the same value of tcfp_mtu) behaves differently for ingress/egress. In addition, the full GSO size is compared to tcfp_mtu: as a consequence, the policer drops GSO packets even when individual segments have the L2 + L3 + L4 + payload length below the configured valued of tcfp_mtu. Improve the accuracy of MTU policing as follows: - account for mac_len for non-GSO packets at TC ingress. - compare MTU threshold with the segmented size for GSO packets. Also, add a kselftest that verifies the correct behavior. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 13 Feb, 2022 10 commits
-
-
Kees Cook authored
With GCC 12, -Wstringop-overread was warning about an implicit cast from char[6] to char[8]. However, the extra 2 bytes are always thrown away, alignment doesn't matter, and the risk of hitting the edge of unallocated memory has been accepted, so this prototype can just be converted to a regular char *. Silences: net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread] 4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’ 375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2], | ^~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
When CONFIG_IPV6 is not set, then the linking of the lan966x driver fails with the following error: drivers/net/ethernet/microchip/lan966x/lan966x_main.c:444: undefined reference to `ipv6_mc_check_mld' The fix consists in adding a check also for IS_ENABLED(CONFIG_IPV6) Fixes: 47aeea0d ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
When CONFIG_PTP_1588_CLOCK is compiled as a module, then the linking of the lan966x fails because it can't find references to the following functions 'ptp_clock_index', 'ptp_clock_register' and 'ptp_clock_unregister' The fix consists in adding CONFIG_PTP_1588_CLOCK_OPTIONAL as a dependency for the driver. Fixes: d0964594 ("net: lan966x: Add support for ptp clocks") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series adds support of the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. The PCI1xxxx family of devices consists of a PCIe switch with a variety of embedded PCI endpoints on its downstream ports. The PCI11010 / PCI11414 devices include an Ethernet 10/100/1000/2500 function as one of those embedded endpoints. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
Add support for Clause-45 MDIO PHY management Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
This change facilitates the selection between SGMII and (R)GIII interfaces Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
Increase MSI / MSI-X vectors supported from 8 to 16 and Interrupt De-assertion timers from 8 to 10 Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
Add support for 4 Tx queues Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
PCI11010/PCI11414 devices are enhancement of Ethernet LAN743x chip family. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
M Chetan Kumar authored
This patch enables Intel M.2 7360 WWAN card support on IOSM Driver. Control path implementation is a reuse whereas data path implementation it uses a different protocol called as MUX Aggregation. The major portion of this patch covers the MUX Aggregation protocol implementation used for IP traffic communication. For M.2 7360 WWAN card, driver exposes 2 wwan AT ports for control communication. The user space application or the modem manager to use wwan AT port for data path establishment. During probe, driver reads the mux protocol device capability register to know the mux protocol version supported by device. Base on which the right mux protocol is initialized for data path communication. An overview of an Aggregation Protocol 1> An IP packet is encapsulated with 16 octet padding header to form a Datagram & the start offset of the Datagram is indexed into Datagram Header (DH). 2> Multiple such Datagrams are composed & the start offset of each DH is indexed into Datagram Table Header (DTH). 3> The Datagram Table (DT) is IP session specific & table_length item in DTH holds the number of composed datagram pertaining to that particular IP session. 4> And finally the offset of first DTH is indexed into DBH (Datagram Block Header). So in TX/RX flow Datagram Block (Datagram Block Header + Payload)is exchanged between driver & device. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 12 Feb, 2022 1 commit
-
-
Jakub Kicinski authored
This reverts commit 038fcdaf. Christophe points out div64_u64() and do_div() have different calling conventions. One updates the param, the other returns the result. Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/all/056a7276-c6f0-cd7e-9e46-1d8507a0b6b1@wanadoo.fr/ Fixes: 038fcdaf ("net: ethernet: cavium: use div64_u64() instead of do_div()") Link: https://lore.kernel.org/r/20220211020544.3262694-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 11 Feb, 2022 16 commits
-
-
Julia Lawall authored
Platform_driver probe functions aren't called with locks held and thus don't need GFP_ATOMIC. Use GFP_KERNEL instead. Problem found with Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220210204223.104181-1-Julia.Lawall@inria.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Hariprasad Kelam authored
This patch fixes below error by using proper data type. drivers/net/ethernet/marvell/octeontx2/af/rpm.c: In function 'rpm_cfg_pfc_quanta_thresh': include/linux/find.h:40:23: error: array subscript 'long unsigned int[0]' is partly outside array bounds of 'u16[1]' {aka 'short unsigned int[1]'} [-Werror=array-bounds] 40 | val = *addr & GENMASK(size - 1, offset); Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20220211155539.13931-1-hkelam@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David S. Miller authored
Merge tag 'wireless-next-2022-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next wireless-next patches for v5.18 First set of patches for v5.18, with both wireless and stack patches. rtw89 now has AP mode support and wcn36xx has survey support. But otherwise pretty normal. Major changes: ath11k * add LDPC FEC type in 802.11 radiotap header * enable RX PPDU stats in monitor co-exist mode wcn36xx * implement survey reporting brcmfmac * add CYW43570 PCIE device rtw88 * rtw8821c: enable RFE 6 devices rtw89 * AP mode support mt76 * mt7916 support * background radar detection support
-
David S. Miller authored
Eric Dumazet says: ==================== ipv6: remove addrconf reliance on loopback Second patch in this series removes IPv6 requirement about the netns loopback device being the last device being dismantled. This was needed because rt6_uncached_list_flush_dev() and ip6_dst_ifdown() had to switch dst dev to a known device (loopback). Instead of loopback, we can use the (hidden) blackhole_netdev which is also always there. This will allow future simplfications of netdev_run_to() and other parts of the stack like default_device_exit_batch(). Last two patches are optimizations for both IP families. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This is an optimization to keep the per-cpu lists as short as possible: Whenever rt_flush_dev() changes one rtable dst.dev matching the disappearing device, it can can transfer the object to a quarantine list, waiting for a final rt_del_uncached_list(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This is an optimization to keep the per-cpu lists as short as possible: Whenever rt6_uncached_list_flush_dev() changes one rt6_info matching the disappearing device, it can can transfer the object to a quarantine list, waiting for a final rt6_uncached_list_del(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
IPv6 addrconf notifiers wants the loopback device to be the last device being dismantled at netns deletion. This caused many limitations and work arounds. Back in linux-5.3, Mahesh added a per host blackhole_netdev that can be used whenever we need to make sure objects no longer refer to a disappearing device. If we attach to blackhole_netdev an ip6_ptr (allocate an idev), then we can use this special device (which is never freed) in place of the loopback_dev (which can be freed). This will permit improvements in netdev_run_todo() and other parts of the stack where had steps to make sure loopback_dev was the last device to disappear. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This counter has never been visible, there is little point trying to maintain it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Holger Brunck authored
The mv88e6352, mv88e6240 and mv88e6176 have a serdes interface. This patch allows to configure the output swing to a desired value in the phy-handle of the port. The value which is peak to peak has to be specified in microvolts. As the chips only supports eight dedicated values we return EINVAL if the value in the DTS does not match one of these values. Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Common PHYs and network PCSes often have the possibility to specify peak-to-peak voltage on the differential pair - the default voltage sometimes needs to be changed for a particular board. Add properties `tx-p2p-microvolt` and `tx-p2p-microvolt-names` for this purpose. The second property is needed to specify the mode for the corresponding voltage in the `tx-p2p-microvolt` property, if the voltage is to be used only for speficic mode. More voltage-mode pairs can be specified. Example usage with only one voltage (it will be used for all supported PHY modes, the `tx-p2p-microvolt-names` property is not needed in this case): tx-p2p-microvolt = <915000>; Example usage with voltages for multiple modes: tx-p2p-microvolt = <915000>, <1100000>, <1200000>; tx-p2p-microvolt-names = "2500base-x", "usb", "pcie"; Add these properties into a separate file phy/transmit-amplitude.yaml, which should be referenced by any binding that uses it. Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guillaume Nault authored
The ->rtm_tos option is normally used to route packets based on both the destination address and the DS field. However it's ignored for IPv6 routes. Setting ->rtm_tos for IPv6 is thus invalid as the route is going to work only on the destination address anyway, so it won't behave as specified. Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladimir Oltean says: ==================== More aggressive DSA cleanup This series deletes some code which is apparently not needed. I've had these patches in my tree for a while, and testing on my boards didn't reveal any issues. Compared to the RFC v1 series, the only change is the addition of patch 3. https://patchwork.kernel.org/project/netdevbpf/cover/20220107184842.550334-1-vladimir.oltean@nxp.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA slave interfaces to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA masters to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
There are no legacy ports, DSA registers a devlink instance with ports unconditionally for all switch drivers. Therefore, delete the old-style ndo operations used for determining bridge forwarding domains. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
D. Wythe says: ==================== net/smc: Optimizing performance in short-lived scenarios This patch set aims to optimizing performance of SMC in short-lived links scenarios, which is quite unsatisfactory right now. In our benchmark, we test it with follow scripts: ./wrk -c 10000 -t 4 -H 'Connection: Close' -d 20 http://smc-server Current performance figures like that: Running 20s test @ http://11.213.45.6 4 threads and 10000 connections 4956 requests in 20.06s, 3.24MB read Socket errors: connect 0, read 0, write 672, timeout 0 Requests/sec: 247.07 Transfer/sec: 165.28KB There are many reasons for this phenomenon, this patch set doesn't solve it all though, but it can be well alleviated with it in. Patch 1/5 (Make smc_tcp_listen_work() independent) : Separate smc_tcp_listen_work() from smc_listen_work(), make them independent of each other, the busy SMC handshake can not affect new TCP connections visit any more. Avoid discarding a large number of TCP connections after being overstock, which is undoubtedly raise the connection establishment time. Patch 2/5 (Limit SMC backlog connections): Since patch 1 has separated smc_tcp_listen_work() from smc_listen_work(), an unrestricted TCP accept have come into being. This patch try to put a limit on SMC backlog connections refers to implementation of TCP. Patch 3/5 (Limit SMC visits when handshake workqueue congested): Considering the complexity of SMC handshake right now, in short-lived links scenarios, this may not be the main scenario of SMC though, it's performance is still quite poor. This patch try to provide constraint on SMC handshake when handshake workqueue congested, which is the sign of SMC handshake stacking in our opinion. Patch 4/5 (Dynamic control handshake limitation by socket options) This patch allow applications dynamically control the ability of SMC handshake limitation. Since SMC don't support set SMC socket option before, this patch also have to support SMC's owns socket options. Patch 5/5 (Add global configure for handshake limitation by netlink) This patch provides a way to get benefit of handshake limitation without modifying any code for applications, which is quite useful for most existing applications. After this patch set, performance figures like that: Running 20s test @ http://11.213.45.6 4 threads and 10000 connections 693253 requests in 20.10s, 452.88MB read Requests/sec: 34488.13 Transfer/sec: 22.53MB That's a quite well performance improvement, about to 6 to 7 times in my environment. --- changelog: v1 -> v2: - fix compile warning - fix invalid dependencies in kconfig v2 -> v3: - correct spelling mistakes - fix useless variable declare v3 -> v4 - make smc_tcp_ls_wq be static v4 -> v5 - add dynamic control for SMC auto fallback by socket options - add global configure for SMC auto fallback through netlink v5 -> v6 - move auto fallback to net namespace scope - remove auto fallback attribute in SMC_GEN_SYS_INFO - add independent attributes for auto fallback v6 -> v7 - fix wording and the naming issues, rename 'auto fallback' to handshake limitation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-