- 09 Nov, 2018 18 commits
-
-
Stefano Brivio authored
draft-ietf-nvo3-geneve-08 says: It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191], [RFC1981]) be used by setting the DF bit in the IP header when Geneve packets are transmitted over IPv4 (this is the default with IPv6). Now that ICMP error handling is working for GENEVE, we can comply with this recommendation. Make this configurable, though, to avoid breaking existing setups. By default, DF won't be set. It can be set or inherited from inner IPv4 packets. If it's configured to be inherited and we are encapsulating IPv6, it will be set. This only applies to non-lwt tunnels: if an external control plane is used, tunnel key will still control the DF flag. v2: - DF behaviour configuration only applies for non-lwt tunnels, apply DF setting only if (!geneve->collect_md) in geneve_xmit_skb() (Stephen Hemminger) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
Export an encap_err_lookup() operation to match an ICMP error against a valid VNI. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
Use a router between endpoints, implemented via namespaces, set a low MTU between router and destination endpoint, exceed it and check PMTU value in route exceptions. v2: - Change all occurrences of VxLAN to VXLAN (Jiri Benc) - Introduce IPv4 tests right away, if iproute2 doesn't support the 'df' link option they will be skipped (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its value from the IPv4 inner header. If the encapsulated protocol is IPv6 and DF is configured to be inherited, always set it. For IPv4, inheriting DF from the inner header was probably intended from the very beginning judging by the comment to vxlan_xmit(), but it wasn't actually implemented -- also because it would have done more harm than good, without handling for ICMP Fragmentation Needed messages. According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe PMTUD implementation, says that "is a MUST that Vxlan gateways [...] SHOULD set the DF-bit [...]", whatever that means. Given this background, the only sane option is probably to let the user decide, and keep the current behaviour as default. This only applies to non-lwt tunnels: if an external control plane is used, tunnel key will still control the DF flag. v2: - DF behaviour configuration only applies for non-lwt tunnels, move DF setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
Export an encap_err_lookup() operation to match an ICMP error against a valid VNI. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
For both IPv4 and IPv6, if we can't match errors to a socket, try tunnels before ignoring them. Look up a socket with the original source and destination ports as found in the UDP packet inside the ICMP payload, this will work for tunnels that force the same destination port for both endpoints, i.e. VXLAN and GENEVE. Actually, lwtunnels could break this assumption if they are configured by an external control plane to have different destination ports on the endpoints: in this case, we won't be able to trace ICMP messages back to them. For IPv6 redirect messages, call ip6_redirect() directly with the output interface argument set to the interface we received the packet from (as it's the very interface we should build the exception on), otherwise the new nexthop will be rejected. There's no such need for IPv4. Tunnels can now export an encap_err_lookup() operation that indicates a match. Pass the packet to the lookup function, and if the tunnel driver reports a matching association, continue with regular ICMP error handling. v2: - Added newline between network and transport header sets in __udp{4,6}_lib_err_encap() (David Miller) - Removed redundant skb_reset_network_header(skb); in __udp4_lib_err_encap() - Removed redundant reassignment of iph in __udp4_lib_err_encap() (Sabrina Dubroca) - Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this won't work with lwtunnels configured to use asymmetric ports. By the way, it's VXLAN, not VxLAN (Jiri Benc) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Trivial fix to spelling mistake in dev_err error message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ganesh Goudar authored
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Li RongQing authored
avoid to compute the hash value if dev is not null, since hash value is not used Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
YueHaibing authored
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/broadcom/genet/bcmgenet.c: In function 'bcmgenet_power_down': drivers/net/ethernet/broadcom/genet/bcmgenet.c:1136:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable] bcmgenet_power_down should return 'ret' instead of 0. Fixes: ca8cf341 ("net: bcmgenet: propagate errors from bcmgenet_power_down") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== net: sched: prepare for more Qdisc offloads This series refactors the "switchdev" Qdisc offloads a little. We have a few Qdiscs which can be fully offloaded today to the forwarding plane of switching devices. First patch adds a helper for handing statistic dumps, the code seems to be copy pasted between PRIO and RED. Second patch removes unnecessary parameter from RED offload function. Third patch makes the MQ offload use the dump helper which helps it behave much like PRIO and RED when it comes to the TCQ_F_OFFLOADED flag. Patch 4 adds a graft helper, similar to the dump helper. Patch 5 is unrelated to offloads, qdisc_graft() code seemed ripe for a small refactor - no functional changes there. Last two patches move the qdisc_put() call outside of the sch_tree_lock section for RED and PRIO. The child Qdiscs will get removed from the hierarchy under the lock, but having the put (and potentially destroy) called outside of the lock helps offload which may choose to sleep, and it should generally lower the Qdisc change impact. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Move destroying of the old child qdiscs outside of the sch_tree_lock() section. This should improve the software qdisc replace but is even more important for offloads. Calling offloads under a spin lock is best avoided, and child's destroy would be called under sch_tree_lock(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Move destroying of the old child qdisc outside of the sch_tree_lock() section. This should improve the software qdisc replace but is even more important for offloads. Firstly calling offloads under a spin lock is best avoided. Secondly the destroy event of existing child would have been sent to the offload device before the replace, causing confusion. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
The code for grafting Qdiscs when there is a parent has two needless indentation levels, and breaks the "keep the success path unindented" guideline. Refactor. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Qdisc graft operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
PRIO and RED mark the qdisc with TCQ_F_OFFLOADED upon successful offload, make MQ do the same. The consistency will help with consistent graft callback behaviour. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Offload dump helper does not use opt parameter, remove it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Qdisc dump operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 08 Nov, 2018 22 commits
-
-
David S. Miller authored
Heiner Kallweit says: ==================== net: phy: improve and simplify phylib state machine This patch series is based on two axioms: - During autoneg a PHY always reports the link being down - Info in clause 22/45 registers doesn't allow to differentiate between these two states: 1. Link is physically down 2. A link partner is connected and PHY is autonegotiating In both cases "link up" and "aneg finished" bits aren't set. One consequence is that having separate states PHY_NOLINK and PHY_AN isn't needed. By using these two axioms the state machine can be significantly simplified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Use phy_check_link_status in more places in the state machine. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
After the recent changes in the state machine state PHY_AN isn't used any longer and can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
In few places in the state machine the state is set to PHY_RUNNING or PHY_NOLINK after doing a phy_read_status(). So factor this out to phy_check_link_status(). First use it in phy_start_aneg(): By setting the state to PHY_RUNNING or PHY_NOLINK directly we can remove the code to handle the case that we're using interrupts and aneg was finished already. Definition of phy_link_up and phy_link_down needs to be moved because they are called in the new function. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
If aneg isn't finished yet then the PHY reports the link as down. There's no benefit in setting the state to PHY_AN because the next state machine run would set the status to PHY_NOLINK anyway (except in the meantime aneg has been finished and link is up). Therefore we can set the state to PHY_RUNNING or PHY_NOLINK directly. In addition change the do_carrier parameter in phy_link_down() to true. If carrier was marked as up before (what should never be the case because PHY was in state PHY_HALTED before) then we should mark it as down now. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
If aneg is enabled and the PHY reports the link as up then definitely aneg finished successfully. Therefore this check is useless and can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-11-07 This series contains updates to almost all of the Intel wired LAN drivers. Lance Roy replaces a spin lock with lockdep_assert_held() for igbvf driver in move toward trying to remove spin_is_locked(). Colin Ian King fixes a potential null pointer dereference by adding a check in ixgbe. Also fixed the igc driver by properly assigning the return error code of a function call, so that we can properly check it. Shannon Nelson updates the ixgbe driver to not block IPsec offload when in VEPA mode, in VEB mode, IPsec offload is still blocked because the device drops packets into a black hole. Jake adds support for software timestamping for packets sent over ixgbevf. Also modifies i40e, iavf, igb, igc, and ixgbe to delay calling skb_tx_timestamp() to the latest point possible, which is just prior to notifying the hardware of the new Tx packet. Todd adds the new WoL filter flag so that we properly report that we do not support this new feature. YueHaibing from Huawei fixes the igc driver by cleaning up variables that are not "really" used. Dan Carpenter cleans up igc whitespace issues. Miroslav Lichvar fixes e1000e for potential underflow issue in the timecounter, so modify the driver to use timecounter_cyc2time() to allow non-monotonic SYSTIM readings. Sasha provides additional igc cleanups based on community feedback. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
John Hurley says: ==================== nfp: add and use tunnel netdev helpers A recent patch introduced the function netif_is_vxlan() to verify the tunnel type of a given netdev as vxlan. Add a similar function to detect geneve netdevs and make use of this function in the NFP driver. Also make use of the vxlan helper where applicable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Offload of geneve decap rules is supported in NFP. Include geneve in the check for supported types. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Make use of the recently added VXLAN and geneve helper functions to determine the type of the netdev from its rtnl_link_ops. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Add a helper function to determine if the type of a netdev is geneve based on its rtnl_link_ops. This allows drivers that may wish to offload tunnels to check the underlying type of the device. A recent patch added a similar helper to vxlan.h Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Edward Cree authored
Expose the MUM/SUC Firmware, UEFI Expansion ROM and MC Status partitions of the NIC's NVRAM as MTDs if found on the NIC. The first two are needed in order to properly update them when performing firmware updates; the MC Status partition is used to determine whether a signed firmware image was accepted or rejected by a Secure NIC. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michał Mirosław says: ==================== net/vlan: prepare for removal of VLAN_TAG_PRESENT This is a preparatory patchset before removing the use of VLAN_TAG_PRESENT bit in skb->vlan_tci as indication of VLAN offload. This set includes only cleanups that allow abstracting of code testing VLAN tag presence in drivers and networking code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michał Mirosław authored
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michał Mirosław authored
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michał Mirosław authored
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michał Mirosław authored
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yafang Shao authored
Set the backlog earlier in inet_dccp_listen() and inet_listen(), then we can avoid the redundant setting. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
GSO tunneled packets are always segmented in software before they are transmitted by a VLAN, even when the lower device can offload tunnel encapsulation and VLAN together (i.e., some bits in NETIF_F_GSO_ENCAP_ALL mask are set in the lower device 'vlan_features'). If we let VLANs have the same tunnel offload capabilities as their lower device, throughput can improve significantly when CPU is limited on the transmitter side. - set NETIF_F_GSO_ENCAP_ALL bits in the VLAN 'hw_features', to ensure that 'features' will have those bits zeroed only when the lower device has no hardware support for tunnel encapsulation. - for the same reason, copy GSO-related bits of 'hw_enc_features' from lower device to VLAN, and ensure to update that value when the lower device changes its features. - set NETIF_F_HW_CSUM bit in the VLAN 'hw_enc_features' if 'real_dev' is able to compute checksums at least for a kind of packets, like done with commit 8403debe ("vlan: Keep NETIF_F_HW_CSUM similar to other software devices"). This avoids software segmentation due to mismatching checksum capabilities between VLAN's 'features' and 'hw_enc_features'. Reported-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The tun XDP sendmsg code path, unconditionally computes the symmetric hash of each packet for RFS's sake, even when we could skip it. e.g. when the device has a single queue. This change adds the check already in-place for the skb sendmsg path to avoid unneeded hashing. The above gives small, but measurable, performance gain for VM xmit path when zerocopy is not enabled. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mathias Thore authored
Add byte queue limits support in the fsl_ucc_hdlc driver. Signed-off-by: Mathias Thore <mathias.thore@infinera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Instead of listing every single PHYID, load the driver for every PHYID with a Realtek OUI, independent of model number and revision. This patch also improves two further aspects: - constify realtek_tbl[] - the mask should have been 0xffffffff instead of 0x001fffff so far, by masking out some bits a PHY from another vendor could have been matched Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-