- 12 Nov, 2019 1 commit
-
-
Heiner Kallweit authored
Currently, if network is re-started, we advertise all supported EEE modes, thus potentially overriding a manual adjustment the user made e.g. via ethtool. Be friendly to the user and preserve a manual setting on network re-start. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 11 Nov, 2019 27 commits
-
-
Xin Long authored
TUNNEL_OPTIONS_PRESENT (TUNNEL_GENEVE_OPT|TUNNEL_VXLAN_OPT| TUNNEL_ERSPAN_OPT) flags should be set only according to tb[LWTUNNEL_IP_OPTS], which is done in ip_tun_parse_opts(). When setting info key.tun_flags, the TUNNEL_OPTIONS_PRESENT bits in tb[LWTUNNEL_IP(6)_FLAGS] passed from users should be ignored. While at it, replace all (TUNNEL_GENEVE_OPT|TUNNEL_VXLAN_OPT| TUNNEL_ERSPAN_OPT) with 'TUNNEL_OPTIONS_PRESENT'. Fixes: 3093fbe7 ("route: Per route IP tunnel metadata via lightweight tunnel") Fixes: 32a2b002 ("ipv6: route: per route IP tunnel metadata via lightweight tunnel") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
erspan v1 has OPT_ERSPAN_INDEX while erspan v2 has OPT_ERSPAN_DIR and OPT_ERSPAN_HWID attributes, and they require different nlsize when dumping. So this patch is to get nlsize for erspan options properly according to erspan version. Fixes: b0a21810 ("lwtunnel: add options setting and dumping for erspan") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
As the new options added in kernel, all should always use strict parsing from the beginning with nla_parse_nested(), instead of nla_parse_nested_deprecated(). Fixes: b0a21810 ("lwtunnel: add options setting and dumping for erspan") Fixes: edf31cbb ("lwtunnel: add options setting and dumping for vxlan") Fixes: 4ece4778 ("lwtunnel: add options setting and dumping for geneve") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladimir Oltean says: ==================== Accomodate DSA front-end into Ocelot After the nice "change-my-mind" discussion about Ocelot, Felix and LS1028A (which can be read here: https://lkml.org/lkml/2019/6/21/630), we have decided to take the route of reworking the Ocelot implementation in a way that is DSA-compatible. This is a large series, but hopefully is easy enough to digest, since it contains mostly code refactoring. What needs to be changed: - The struct net_device, phy_device needs to be isolated from Ocelot private structures (struct ocelot, struct ocelot_port). These will live as 1-to-1 equivalents to struct dsa_switch and struct dsa_port. - The function prototypes need to be compatible with DSA (of course, struct dsa_switch will become struct ocelot). - The CPU port needs to be assigned via a higher-level API, not hardcoded in the driver. What is going to be interesting is that the new DSA front-end of Ocelot will need to have features in lockstep with the DSA core itself. At the moment, some more advanced tc offloading features of Ocelot (tc-flower, etc) are not available in the DSA front-end due to lack of API in the DSA core. It also means that Ocelot practically re-implements large parts of DSA (although it is not a DSA switch per se) - see the FDB API for example. The code has been only compile-tested on Ocelot, since I don't have access to any VSC7514 hardware. It was proven to work on NXP LS1028A, which instantiates a DSA derivative of Ocelot. So I would like to ask Alex Belloni if you could confirm this series causes no regression on the Ocelot MIPS SoC. The goal is to get this rework upstream as quickly as possible, precisely because it is a large volume of code that risks gaining merge conflicts if we keep it for too long. This is but the first chunk of the LS1028A Felix DSA driver upstreaming. For those who are interested, the concept can be seen on my private Github repo, the user of this reworked Ocelot driver living under drivers/net/dsa/vitesse/: https://github.com/vladimiroltean/ls1028ardb-linux ==================== Acked-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
VSC7514 is a 10-port switch with 2 extra "CPU ports" (targets in the queuing subsystem for terminating traffic locally). There are 2 issues with hardcoding the CPU port as #10: - It is not clear which snippets of the code are configuring something for one of the CPU ports, and which snippets are just doing something related to the number of physical ports. - Actually any physical port can act as a CPU port connected to an external CPU (in addition to the local CPU). This is called NPI mode (Node Processor Interface) and is the way that the 6-port VSC9959 (Felix) switch is integrated inside NXP LS1028A (the "local management CPU" functionality is not used there). This patch makes it clear that the ocelot_bridge_stp_state_set function operates on the CPU port (by making it an implicit member of the bridging domain), and at the same time adds logic for the NPI port (aka a physical port) to play the role of a CPU port (it shouldn't be part of bridge_fwd_mask, as it's not explicitly enslaved to a bridge). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Now that the places that configure routing destinations for the CPU port have been marked as such, allow callers to specify their own CPU port that is different than ocelot->num_phys_ports. A user will be the Felix DSA driver, where the CPU port is one of the physical ports (NPI mode). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
This will be called from the Felix DSA frontend, which will work in PHYLIB compatibility mode initially. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Claudiu Manoil authored
This is just common path code that belongs to ocelot_init, it has nothing to do with a specific SoC/board instance. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Allow these functions to be called from the .port_enable and .port_disable callbacks of DSA. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
We need a function for the DSA front-end that does none of the net_device registration, but initializes the hardware ports. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The VSC7514 switch (Ocelot) is a 10-port device, while VSC9959 (Felix) is 6-port. Therefore the VLAN filtering mask would be out of bounds when calling for this new switch. Fix that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Convert them into an implementation that can be called from DSA as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The ocelot and ocelot_port structures will be used by a new DSA driver, so the ocelot_board.c file will have to allocate and work with a private structure (ocelot_port_private), which embeds the generic struct ocelot_port. This is because in DSA, at least one interface does not have a net_device, and the DSA driver API does not interact with that anyway. The ocelot_port structure is equivalent to dsa_port, and ocelot to dsa_switch. The members of ocelot_port which have an equivalent in dsa_port (such as dp->vlan_filtering) have been moved to ocelot_port_private. We want to enforce the coding convention that "ocelot_port" refers to the structure, and "port" refers to the integer index. One can retrieve the structure at any time from ocelot->ports[port]. The patch is large but only contains variable renaming and mechanical movement of fields from one structure to another. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The ocelot_port structure has a net_device embedded in it, which makes it unsuitable for leaving it in the driver implementation functions. Leave ocelot_flower.c untouched. In that file, ocelot_port is used as an interface to the tc shared blocks. That will be addressed in the next patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
This is needed so that the Felix DSA front-end can call the Ocelot implementations. The implementation of the "mc_disabled" switchdev attribute has also been simplified by using the read-modify-write macro instead of open-coding that operation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
This is needed in order to present a simpler prototype to the DSA front-end of ocelot. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
To be able to implement a DSA front-end over ocelot_fdb_add, ocelot_fdb_del, ocelot_fdb_dump, these need to have a simple function prototype that is independent of struct net_device, netlink skb, etc. So rename the ndo ops of the ocelot driver into ocelot_port_fdb_{add,del,dump}, and have them all call the abstract implementations. At the same time, refactor ocelot_port_fdb_do_dump into a function whose prototype is compatible with dsa_fdb_dump_cb_t, so that the do_dump implementations can live together and be called by the ocelot_fdb_dump through a function pointer. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
We need an implementation of these functions that is agnostic to the higher layer (switchdev or dsa). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
This patch transforms the ocelot_vlan_port_apply function ("apply what?") into 3 standalone functions: - ocelot_port_vlan_filtering - ocelot_port_set_native_vlan - ocelot_port_set_pvid These functions have a prototype that is better aligned to the DSA API. The function also had some static initialization (TPID, drop frames with multicast source MAC) which was not being changed from any place, so that was just moved to ocelot_probe_port (one of the 6 callers of ocelot_vlan_port_apply). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Iwan R Timmer says: ==================== net: dsa: mv88e6xxx: Add support for port mirroring This patch series add support for port mirroring in the mv88e6xx switch driver. The first patch changes the set_egress_port function to allow different egress ports for egress and ingress traffic. The second patch adds the actual code for port mirroring support. Tested on a 88E6176 with: tc qdisc add dev wan0 clsact tc filter add dev wan0 ingress matchall skip_sw \ action mirred egress mirror dev lan2 tc filter add dev wan0 egress matchall skip_sw \ action mirred egress mirror dev lan3 Changes in v3 - Use enum for egress traffic direction - Keep track of egress ports on mv88e6390 - Move booleans in struct for better structure packing Changes in v2 - Support mirroring egress and ingress traffic to different ports - Check for invalid configurations when multiple ports are mirrored ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Iwan R Timmer authored
Add support for configuring port mirroring through the cls_matchall classifier. We do a full ingress and/or egress capture towards a capture port. It allows setting a different capture port for ingress and egress traffic. It keeps track of the mirrored ports and the destination ports to prevent changes to the capture port while other ports are being mirrored. Signed-off-by: Iwan R Timmer <irtimmer@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Iwan R Timmer authored
Separate the configuration of the egress and ingress monitor port. This allows the port mirror functionality to do ingress and egress port mirroring to separate ports. Signed-off-by: Iwan R Timmer <irtimmer@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Efstathiades authored
The LAN743x Ethernet controller provides two independent PTP event channels. Each one can be used to generate a periodic output from the PTP clock. The output can be routed to any one of the available GPIO pins on the device. The PTP clock API can now be used to: - select any LAN743x GPIO pin to function as a periodic output - select either LAN743x PTP event channel to generate the output The LAN7430 has 4 GPIO pins that are multiplexed with its internal PHY LED control signals. A pin assigned to the LED control function will be assigned to the GPIO function if selected for PTP periodic output. Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladimir Oltean says: ==================== Unlock new potential in SJA1105 with PTP system timestamping The SJA1105 being an automotive switch means it is designed to live in a set-and-forget environment, far from the configure-at-runtime nature of Linux. Frequently resetting the switch to change its static config means it loses track of its PTP time, which is not good. This patch series implements PTP system timestamping for this switch (using the API introduced for SPI here: https://www.mail-archive.com/netdev@vger.kernel.org/msg316725.html), adding the following benefits to the driver: - When under control of a user space PTP servo loop (ptp4l, phc2sys), the loss of sync during a switch reset is much more manageable, and the switch still remains in the s2 (locked servo) state. - When synchronizing the switch using the software technique (based on reading clock A and writing the value to clock B, as opposed to relying on hardware timestamping), e.g. by using phc2sys, the sync accuracy is vastly improved due to the fact that the actual switch PTP time can now be more precisely correlated with something of better precision (CLOCK_REALTIME). The issue is that SPI transfers are inherently bad for measuring time with low jitter, but the newly introduced API aims to alleviate that issue somewhat. This series is also a requirement for a future patch set that adds full time-aware scheduling offload support for the switch. ==================== Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The purpose here is to avoid ptp4l fail due to this condition: timed out while polling for tx timestamp increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug port 1: send peer delay request failed So either reset the switch before the management frame was sent, or after it was timestamped as well, but not in the middle. The condition may arise either due to a true timeout (i.e. because re-uploading the static config takes time), or due to the TX timestamp actually getting lost due to reset. For the former we can increase tx_timestamp_timeout in userspace, for the latter we need this patch. Locking all traffic during switch reset does not make sense at all, though. Forcing all CPU-originated traffic to potentially block waiting for a sleepable context to send > 800 bytes over SPI is not a good idea. Flows that are autonomously forwarded by the switch will get dropped anyway during switch reset no matter what. So just let all other CPU-originated traffic be dropped as well. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The PTP time of the switch is not preserved when uploading a new static configuration. Work around this hardware oddity by reading its PTP time before a static config upload, and restoring it afterwards. Static config changes are expected to occur at runtime even in scenarios directly related to PTP, i.e. the Time-Aware Scheduler of the switch is programmed in this way. Perhaps the larger implication of this patch is that the PTP .gettimex64 and .settime functions need to be exposed to sja1105_main.c, where the PTP lock needs to be held during this entire process. So their core implementation needs to move to some common functions which get exposed in sja1105_ptp.h. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace applications (i.e. phc2sys) to compensate for the delays incurred while reading the PHC's time. The task itself of taking the software timestamp is delegated to the SPI subsystem, through the newly introduced API in struct spi_transfer. The goal is to cross-timestamp I/O operations on the switch's PTP clock with values in the local system clock (CLOCK_REALTIME). For that we need to understand a bit of the hardware internals. The 'read PTP time' message is a 12 byte structure, first 4 bytes of which represent the SPI header, and the last 8 bytes represent the 64-bit PTP time. The switch itself starts processing the command immediately after receiving the last bit of the address, i.e. at the middle of byte 3 (last byte of header). The PTP time is shadowed to a buffer register in the switch, and retrieved atomically during the subsequent SPI frames. A similar thing goes on for the 'write PTP time' message, although in that case the switch waits until the 64-bit PTP time becomes fully available before taking any action. So the byte that needs to be software-timestamped is byte 11 (last) of the transfer. The patch creates a common (and local) sja1105_xfer implementation for the SPI I/O, and offers 3 front-ends: - sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally requesting a PTP timestamp - sja1105_xfer_buf: this is for large transfers (e.g. the static config buffer) and other misc data, and there is no point in giving timestamping capabilities to this. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 10 Nov, 2019 7 commits
-
-
David S. Miller authored
Heiner Kallweit says: ==================== r8169: improve PHY configuration This series adds helpers to improve and simplify the PHY configuration on various network chip versions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
rtl8168c_4_hw_phy_config() duplicates rtl8168c_3_hw_phy_config(), so we can remove the function. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Certain integrated PHY's from RTL8168d support extended pages. On page 0x0007 the number of the extended page is written to register 0x1e, then the registers on the extended page can be accessed. Add a helper for this to improve readability and simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Use the phylib MDIO access functions in more places to simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Integrated PHY's from RTL8168d support an indirect access method for PHY parameters. On page 0x0005 parameter number is written to register 0x05, then the parameter can be accessed via register 0x06. Add a helper for this to improve readability and simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Integrated PHY's from RTL8168g support an indirect access method for PHY parameters. On page 0x0a43 parameter number is written to register 0x13, then the parameter can be accessed via register 0x14. Add a helper for this to improve readability and simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King authored
The current upstream interface is an all-or-nothing, which is sub-optimal for future changes, as it doesn't allow the upstream driver to prepare for the SFP module becoming available, as it is at boot. Switch to a find-sfp-bus, add-upstream, del-upstream, put-sfp-bus interface structure instead, which allows the upstream driver to prepare for a module being available as soon as add-upstream is called. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 09 Nov, 2019 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller authored
One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) BPF sample build fixes from Björn Töpel 2) Fix powerpc bpf tail call implementation, from Eric Dumazet. 3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet. 4) Fix crash in ebtables when using dnat target, from Florian Westphal. 5) Fix port disable handling whne removing bcm_sf2 driver, from Florian Fainelli. 6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski. 7) Various KCSAN fixes all over the networking, from Eric Dumazet. 8) Memory leaks in mlx5 driver, from Alex Vesker. 9) SMC interface refcounting fix, from Ursula Braun. 10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu. 11) Add a TX lock to synchonize the kTLS TX path properly with crypto operations. From Jakub Kicinski. 12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano Garzarella. 13) Infinite loop in Intel ice driver, from Colin Ian King. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits) ixgbe: need_wakeup flag might not be set for Tx i40e: need_wakeup flag might not be set for Tx igb/igc: use ktime accessors for skb->tstamp i40e: Fix for ethtool -m issue on X722 NIC iavf: initialize ITRN registers with correct values ice: fix potential infinite loop because loop counter being too small qede: fix NULL pointer deref in __qede_remove() net: fix data-race in neigh_event_send() vsock/virtio: fix sock refcnt holding during the shutdown net: ethernet: octeon_mgmt: Account for second possible VLAN header mac80211: fix station inactive_time shortly after boot net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix ieee80211_txq_setup_flows() failure path ipv4: Fix table id reference in fib_sync_down_addr ipv6: fixes rt6_probe() and fib6_nh->last_probe init net: hns: Fix the stray netpoll locks causing deadlock in NAPI path net: usb: qmi_wwan: add support for DW5821e with eSIM support CDC-NCM: handle incomplete transfer of MTU nfc: netlink: fix double device reference drop NFC: st21nfca: fix double free ...
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: - Two NVMe device removal crash fixes, and a compat fixup for for an ioctl that was introduced in this release (Anton, Charles, Max - via Keith) - Missing error path mutex unlock for drbd (Dan) - cgroup writeback fixup on dead memcg (Tejun) - blkcg online stats print fix (Tejun) * tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block: cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead block: drbd: remove a stray unlock in __drbd_send_protocol() blkcg: make blkcg_print_stat() print stats only for online blkgs nvme: change nvme_passthru_cmd64 to explicitly mark rsvd nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths nvme-rdma: fix a segmentation fault during module unload
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queueDavid S. Miller authored
Jeff Kirsher says: ==================== Intel Wired LAN Driver Fixes 2019-11-08 This series contains fixes to igb, igc, ixgbe, i40e, iavf and ice drivers. Colin Ian King fixes a potentially wrap-around counter in a for-loop. Nick fixes the default ITR values for the iavf driver to 50 usecs interval. Arkadiusz fixes 'ethtool -m' for X722 devices where the correct value cannot be obtained from the firmware, so add X722 to the check to ensure the wrong value is not returned. Jake fixes igb and igc drivers in their implementation of launch time support by declaring skb->tstamp value as ktime_t instead of s64. Magnus fixes ixgbe and i40e where the need_wakeup flag for transmit may not be set for AF_XDP sockets that are only used to send packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Magnus Karlsson authored
The need_wakeup flag for Tx might not be set for AF_XDP sockets that are only used to send packets. This happens if there is at least one outstanding packet that has not been completed by the hardware and we get that corresponding completion (which will not generate an interrupt since interrupts are disabled in the napi poll loop) between the time we stopped processing the Tx completions and interrupts are enabled again. In this case, the need_wakeup flag will have been cleared at the end of the Tx completion processing as we believe we will get an interrupt from the outstanding completion at a later point in time. But if this completion interrupt occurs before interrupts are enable, we lose it and should at that point really have set the need_wakeup flag since there are no more outstanding completions that can generate an interrupt to continue the processing. When this happens, user space will see a Tx queue need_wakeup of 0 and skip issuing a syscall, which means will never get into the Tx processing again and we have a deadlock. This patch introduces a quick fix for this issue by just setting the need_wakeup flag for Tx to 1 all the time. I am working on a proper fix for this that will toggle the flag appropriately, but it is more challenging than I anticipated and I am afraid that this patch will not be completed before the merge window closes, therefore this easier fix for now. This fix has a negative performance impact in the range of 0% to 4%. Towards the higher end of the scale if you have driver and application on the same core and issue a lot of packets, and towards no negative impact if you use two cores, lower transmission speeds and/or a workload that also receives packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-