- 28 May, 2020 40 commits
-
-
Ronak Doshi authored
Vmxnet3 version 3 device supports checksum/TSO offload. Thus, vNIC to pNIC traffic can leverage hardware checksum/TSO offloads. However, vmxnet3 does not support checksum/TSO offload for Geneve/VXLAN encapsulated packets. Thus, for a vNIC configured with an overlay, the guest stack must first segment the inner packet, compute the inner checksum for each segment and encapsulate each segment before transmitting the packet via the vNIC. This results in significant performance penalty. This patch will enhance vmxnet3 to support Geneve/VXLAN TSO as well as checksum offload. Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
With vmxnet3 version 4, the emulation supports multiqueue(RSS) for UDP and ESP traffic. A guest can enable/disable RSS for UDP/ESP over IPv4/IPv6 by issuing commands introduced in this patch. ESP ipv6 is not yet supported in this patch. This patch implements get_rss_hash_opts and set_rss_hash_opts methods to allow querying and configuring different Rx flow hash configurations. Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
vmxnet3 is currently at version 3 and this patch initiates the preparation to accommodate changes for version 4. Introduced utility macros for vmxnet3 version 4 comparison and update Copyright information. Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
'nic_data' is no longer used outside of the #ifdef block in efx_ef10_set_mac_address: drivers/net/ethernet/sfc/ef10.c:3231:28: error: unused variable 'nic_data' [-Werror,-Wunused-variable] struct efx_ef10_nic_data *nic_data = efx->nic_data; Move the variable into a local scope. Fixes: dfcabb07 ("sfc: move vport_id to struct efx_nic") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2020-05-27 This series contains updates to the ice driver only. Jesse fixes a number of issues, starting with fixing the remaining signed versus unsigned comparison issues. Cleaned up an unused code define. Fixed the implementation of the manage MAC write command, to simplify it by using a simple array to represent the MAC address when writing it. Paul fixes the setting of the VF default LAN address, by removing a check that assumed that the address had been deleted and zeroed. Surabhi prevents a memory leak on filter management initialization failures and during queue initialization and buffer allocation failures. Brett adds additional receive error counters that are reported by ethtool. Fixed the enabling and disabling of VLAN stripping when the PVID has been set. Evan fixes a race condition between the firmware and software, which can occur between the admin queue setup and the first command sent. Marta fixes the driver when XDP transmit rings are destroyed, also make sure the XDP transmit queues are also destroyed. Update the statistics when XDP transmit programs are loaded and packets are sent. Changed the number of XDP transmit queues to match the number of receive queues, instead of matching the number of transmit queues. Bruce avoids undefined behavior by not writing the 8-bit element init_q_state with the associated internal-to-hardware field which is 122-bits. Anirudh (Ani) refactors the receive checksum checks. Krzysztof notifies the user if the fill queue is not long enough to prepare all buffers before packet processing starts and allocates the buffers during the NAPI poll. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Christoph Hellwig says: ==================== remove most callers of kernel_setsockopt v3 this series removes most callers of the kernel_setsockopt functions, and instead switches their users to small functions that implement setting a sockopt directly using a normal kernel function call with type safety and all the other benefits of not having a function call. In some cases these functions seem pretty heavy handed as they do a lock_sock even for just setting a single variable, but this mirrors the real setsockopt implementation unlike a few drivers that just set set the fields directly. Changes since v2: - drop the separately merged kernel_getopt_removal - drop the sctp patches, as there is conflicting cleanup going on - add an additional ACK for the rxrpc changes Changes since v1: - use ->getname for sctp sockets in dlm - add a new ->bind_add struct proto method for dlm/sctp - switch the ipv6 and remaining sctp helpers to inline function so that the ipv6 and sctp modules are not pulled in by any module that could potentially use ipv6 or sctp connections - remove arguments to various sock_* helpers that are always used with the same constant arguments ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Avoid using kernel_setsockopt for the TIPC_IMPORTANCE option when we can just use the internal helper. The only change needed is to pass a struct sock instead of tipc_sock, which is private to socket.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the RXRPC_MIN_SECURITY_LEVEL sockopt from kernel space without going through a fake uaccess. Thanks to David Howells for the documentation updates. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IPV6_RECVPKTINFO sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IPV6_ADD_PREFERENCES sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IPV6_RECVERR sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IPV6_V6ONLY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IP_PKTINFO sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IP_MTU_DISCOVER sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> [rxrpc bits] Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IP_RECVERR sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IP_FREEBIND sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the IP_TOS sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_KEEPINTVL sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_KEEP_IDLE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_SYNCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_QUICKACK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the TCP_CORK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_REUSEPORT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly enable timestamps instead of setting the SO_TIMESTAMP* sockopts from kernel space and going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel space without going through a fake uaccess. The interface is simplified to only pass the seconds value, as that is the only thing needed at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_PRIORITY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_LINGER sockopt from kernel space with onoff set to true and a linger time of 0 without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space without going through a fake uaccess. For this the iscsi target now has to formally depend on inet to avoid a mostly theoretical compile failure. For actual operation it already did depend on having ipv4 or ipv6 support. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2020-05-26 Updates highlights: 1) From Vu Pham (8): Support VM traffics failover with bonded VF representors and e-switch egress/ingress ACLs This series introduce the support for Virtual Machine running I/O traffic over direct/fast VF path and failing over to slower paravirtualized path using the following features: __________________________________ | VM _________________ | | |FAILOVER device | | | |________________| | | | | | ____|_____ | | | | | | ______ |___ ____|_______ | | | VF PT | |VIRTIO-NET | | | | device | | device | | | |_________| |___________| | |___________|______________|________| | | | HYPERVISOR | | ____|______ | | macvtap | | |virtio BE | | |___________| | | | ____|_____ | |host VF | | |_________| | | _____|______ _____|_____ | PT VF | | host VF | |representor| |representor| |___________| |___________| \ / \ / \ / \ / _________________ \_______/ | | _______|________ | V-SWITCH | |VF representors |________________| (OVS) | | bond | |________________| |________________| | ________|________ | Uplink | | representor | |_________________| Summary: -------- Problem statement: ------------------ Currently in above topology, when netfailover device is configured using VFs and eswitch VF representors, and when traffic fails over to stand-by VF which is exposed using macvtap device to guest VM, eswitch fails to switch the traffic to the stand-by VF representor. This occurs because there is no knowledge at eswitch level of the stand-by representor device. Solution: --------- Using standard bonding driver, a bond netdevice is created over VF representor device which is used for offloading tc rules. Two VF representors are bonded together, one for the passthrough VF device and another one for the stand-by VF device. With this solution, mlx5 driver listens to the failover events occuring at the bond device level to failover traffic to either of the active VF representor of the bond. a. VM with netfailover device of VF pass-thru (PT) device and virtio-net paravirtualized device with same MAC-address to handle failover traffics at VM level. b. Host bond is active-standby mode, with the lower devices being the VM VF PT representor, and the representor of the 2nd VF to handle failover traffics at Hypervisor/V-Switch OVS level. - During the steady state (fast datapath): set the bond active device to be the VM PT VF representor. - During failover: apply bond failover to the second VF representor device which connects to the VM non-accelerated path. c. E-Switch ingress/egress ACL tables to support failover traffics at E-Switch level I. E-Switch egress ACL with forward-to-vport rule: - By default, eswitch vport egress acl forward packets to its counterpart NIC vport. - During port failover, the egress acl forward-to-vport rule will be added to e-switch vport of passive/in-active slave VF representor to forward packets to other e-switch vport ie. the active slave representor's e-switch vport to handle egress "failover" traffics. - Using lower change netdev event to detect a representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all other slave netdevs to forward to this representor's vport. - Using upper change netdev event to detect a representor unslaving from bond device to delete its vport's egress acl forward-to-vport rule. II. E-Switch ingress ACL metadata reg_c for match - Bonded representors' vorts sharing tc block have the same root ingress acl table and a unique metadata for match. - Traffics from both representors's vports will be tagged with same unique metadata reg_c. - Using upper change netdev event to detect a representor enslaving/unslaving from bond device to setup shared root ingress acl and unique metadata. 2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in software steering 3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW ip_version register rather than parsing eth frames for ethertype. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
SJA1105, being AVB/TSN switches, provide hardware assist for the Credit-Based Shaper as described in the IEEE 8021Q-2018 document. First generation has 10 shapers, freely assignable to any of the 4 external ports and 8 traffic classes, and second generation has 16 shapers. The Credit-Based Shaper tables are accessed through the dynamic reconfiguration interface, so we have to restore them manually after a switch reset. The tables are backed up by the static config only on P/Q/R/S, and we don't want to add custom code only for that family, since the procedure that is in place now works for both. Tested with the following commands: data_rate_kbps=67000 port_transmit_rate_kbps=1000000 idleslope=$data_rate_kbps sendslope=$(($idleslope - $port_transmit_rate_kbps)) locredit=$((-0x80000000)) hicredit=$((0x7fffffff)) tc qdisc add dev swp2 root handle 1: mqprio hw 0 num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 tc qdisc replace dev swp2 parent 1:1 cbs \ idleslope $idleslope \ sendslope $sendslope \ hicredit $hicredit \ locredit $locredit \ offload 1 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Add Nik's torture tests as a new set to stress the replace and cleanup paths. Torture test created by Nikolay Aleksandrov and then I adapted to selftest and added IPv6 version. Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Krzysztof Kazimierczak authored
If a UMEM is present on a queue when an interface/queue pair is being enabled, the driver will try to prepare the Rx buffers in advance to improve performance. However, if fill queue is shorter than HW Rx ring, the driver will report failure after getting the last address from the fill queue. This still lets the driver process the packets correctly during the NAPI poll, but leads to a constant NAPI rescheduling. Not allocating the buffers in advance would result in a potential performance decrease. Commit d57d7642 ("xsk: Add API to check for available entries in FQ") provides an API that lets drivers check the number of addresses that the fill queue holds. Notify the user if fill queue is not long enough to prepare all buffers before packet processing starts, and allocate the buffers during the NAPI poll. If the fill queue size is sufficient, prepare Rx buffers in advance. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alex Vesker authored
Change the locking flow to support RX and TX locks, splitting the single lock to two will allow inserting rules in parallel for RX and TX parts of the FDB. Locking the dr_domain will be done by locking the RX domain and the TX domain locks, this is mostly used for control operations on the dr_domain. When inserting rules for RX or TX the single nic_doamin RX or TX lock will be used. Splitting the lock is safe since RX and TX domains are logically separated from each other, shared objects such the send-ring and memory pool are protected by locks. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-