- 07 Apr, 2016 34 commits
-
-
Jon Paul Maloy authored
Resetting a bearer/interface, with the consequence of resetting all its pertaining links, is not an atomic action. This becomes particularly evident in very large clusters, where a lot of traffic may happen on the remaining links while we are busy shutting them down. In extreme cases, we may even see links being re-created and re-established before we are finished with the job. To solve this, we now introduce a solution where we temporarily detach the bearer from the interface when the bearer is reset. This inhibits all packet reception, while sending still is possible. For the latter, we use the fact that the device's user pointer now is zero to filter out which packets can be sent during this situation; i.e., outgoing RESET messages only. This filtering serves to speed up the neighbors' detection of the loss event, and saves us from unnecessary probing. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Paul Maloy authored
When enabling a bearer we create a 'neigbor discoverer' instance by calling the function tipc_disc_create() before the bearer is actually registered in the list of enabled bearers. Because of this, the very first discovery broadcast message, created by the mentioned function, is lost, since it cannot find any valid bearer to use. Furthermore, the used send function, tipc_bearer_xmit_skb() does not free the given buffer when it cannot find a bearer, resulting in the leak of exactly one send buffer each time a bearer is enabled. This commit fixes this problem by introducing two changes: 1) Instead of attemting to send the discovery message directly, we let tipc_disc_create() return the discovery buffer to the calling function, tipc_enable_bearer(), so that the latter can send it when the enabling sequence is finished. 2) In tipc_bearer_xmit_skb(), as well as in the two other transmit functions at the bearer layer, we now free the indicated buffer or buffer chain when a valid bearer cannot be found. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Tom Herbert says: ==================== udp: GRO in UDP sockets This patch set adds GRO functions (gro_receive and gro_complete) to UDP sockets and removes udp_offload infrastructure. Add GRO functions (gro_receive and gro_complete) to UDP sockets. In udp_gro_receive and udp_gro_complete a socket lookup is done instead of looking up the port number in udp_offloads. If a socket is found and there are GRO functions for it then those are called. This feature allows binding GRO functions to more than just a port number. Eventually, we will be able to use this technique to allow application defined GRO for an application protocol by attaching BPF porgrams to UDP sockets for doing GRO. In order to implement these functions, we added exported udp6_lib_lookup_skb and udp4_lib_lookup_skb functions in ipv4/udp.c and ipv6/udp.c. Also, inet_iif and references to skb_dst() were changed to check that dst is set in skbuf before derefencing. In the GRO path there is now a UDP socket lookup performed before dst is set, to the get the device in that case we simply use skb->dev. Tested: Ran various combinations of VXLAN and GUE TCP_STREAM and TCP_RR tests. Did not see any material regression. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Now that the UDP encapsulation GRO functions have been moved to the UDP socket we not longer need the udp_offload insfrastructure so removing it. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Adapt geneve_gro_receive, geneve_gro_complete to take a socket argument. Set these functions in tunnel_config. Don't set udp_offloads any more. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Adapt gue_gro_receive, gue_gro_complete to take a socket argument. Don't set udp_offloads any more. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Adapt vxlan_gro_receive, vxlan_gro_complete to take a socket argument. Set these functions in tunnel_config. Don't set udp_offloads any more. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Add gro_receive and gro_complete to struct udp_tunnel_sock_cfg. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This patch adds GRO functions (gro_receive and gro_complete) to UDP sockets. udp_gro_receive is changed to perform socket lookup on a packet. If a socket is found the related GRO functions are called. This features obsoletes using UDP offload infrastructure for GRO (udp_offload). This has the advantage of not being limited to provide offload on a per port basis, GRO is now applied to whatever individual UDP sockets are bound to. This also allows the possbility of "application defined GRO"-- that is we can attach something like a BPF program to a UDP socket to perfrom GRO on an application layer protocol. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Add externally visible functions to lookup a UDP socket by skb. This will be used for GRO in UDP sockets. These functions also check if skb->dst is set, and if it is not skb->dev is used to get dev_net. This allows calling lookup functions before dst has been set on the skbuff. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
In inet_iif check if skb_rtable is NULL for the skb and return skb->skb_iif if it is. This change allows inet_iif to be called before the dst information has been set in the skb (e.g. when doing socket based UDP GRO). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Hannes Frederic Sowa says: ==================== sock: lockdep tightening First patch is from Eric Dumazet and improves lockdep accuracy for socket locks. After that, second patch introduces lockdep_sock_is_held and uses it. Final patch reverts and reworks the lockdep fix from Daniel in the filter code, as we now have tighter lockdep support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hannes Frederic Sowa authored
This reverts commit 5a5abb1f ("tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter") and replaces it to use lock_sock around sk_{attach,detach}_filter. The checks inside filter.c are updated with lockdep_sock_is_held to check for proper socket locks. It keeps the code cleaner by ensuring that only one lock governs the socket filter instead of two independent locks. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hannes Frederic Sowa authored
The socket is either locked if we hold the slock spin_lock for lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned != 0). Check for this and at the same time improve that the current thread/cpu is really holding the lock. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hannes Frederic Sowa authored
During release_sock we use callbacks to finish the processing of outstanding skbs on the socket. We actually are still locked, sk_locked.owned == 1, but we already told lockdep that the mutex is released. This could lead to false positives in lockdep for lockdep_sock_is_held (we don't hold the slock spinlock during processing the outstanding skbs). I took over this patch from Eric Dumazet and tested it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
David Ahern reported panics in __inet_hash() caused by my recent commit. The reason is inet_reuseport_add_sock() was still using sk_nulls_for_each_rcu() instead of sk_for_each_rcu(). SO_REUSEPORT enabled listeners were causing an instant crash. While chasing this bug, I found that I forgot to clear SOCK_RCU_FREE flag, as it is inherited from the parent at clone time. Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2016-04-06 This series contains updates to e1000, e1000e, igb and Kconfig. Alex fixes igb where we were casting the MAC address as __beXX and then passing it into le32_to_cpu, when we could simply cast as __lexx to maintain consistency since it is already little endian. Then enabled bulk free in transmit cleanup for igb. John Holland enables igb to pickup the MAC address from a device tree blob when CONFIG_OF has been enabled. Doron Shikmoni fixes a bug in the output of "ethtool -m ethX" where the data byte appeared duplicated. Stefan fixes up e1000 and e1000e ethtool offline tests which were calling dev_close() which causes IFF_UP to be cleared which removes teh interface routes and some addresses, so use ndo_stop() instead. Jiri Benc cleans up some old links in the Kconfig for Intel drivers where we referred to a URL which is no longer valid. I am so glad Jiri has the time in his day to spend clicking on and testing all the URL links in the the kernel. Arika Chen reverts the addition of a 'rtnl_unlock()' which had a unmatched 'rtnl_lock()' call before it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-04-06 This series contains updates to i40e and i40evf. Deepthi adds a debug message to display the MSIx vector count for hardware capabilities. Shannon removed the setting of debug_mask at startup to take care of an issue where all the device capabilities getting printed when we had not asked for it. Moved the NVM status out of the admin queue structure, since it should really stay with the other NVM data structures. Akeem added the flush routine to the end of the reset flow to avoid problems in the pass-through routines. Jesse moves a local variable deeper into the depths of the driver where the light is low and the context is great. Then cleaned up the tx_ring argument since it was not making good arguments. Improved performance by not "checking for FCoE" by re-ordering the FCoE checks. Anjali adds the support for changing a VF from non-trusted to trusted and vice-versa. Mitch adds opcodes and structures to support RSS configuration by PF driver on behalf of the VF driver. Fixed how the VLAN feature flags are set. Kiran added defines for RSS, flow director, flexible payload and IPv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arika Chen authored
This reverts commit 3eb14ea8 ("igb: Fix a deadlock in igb_sriov_reinit") It is the same as commit f468adc9 ("igb: missing rtnl_unlock in igb_sriov_reinit()") There is no rtnl_lock() in igb_resume before, rtnl_unlock will cause a deadlock. Signed-off-by: Arika Chen <arika.chen@huawei.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jiri Benc authored
The Kconfig for Intel NICs references two different URLs for the "Adapter & Driver ID Guide". Neither of those two links works. The current URL seems to be http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000005584.html but given it's apparently constantly changing, there's no point in having it in the help text. Just keep a generic pointer to http://support.intel.com. Hopefully, this one will have a longer live. It still works, at least. Furthermore, remove a link to "the latest Intel PRO/100 network driver for Linux", this has no place in the mainline kernel and the latest Linux driver it offers is from 2006, anyway. Signed-off-by: Jiri Benc <jbenc@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
Correctly set the VLAN feature flags after setting the rest of the netdev flags. And don't set them in hw_features, because these can't be controlled by the VF driver. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Harshitha Ramamurthy authored
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Kiran Patil authored
Add defines for input set mask (RSS, flow director, flexible payload), including defines specific to IPv6. Change-ID: Ie95ef7d0916a4d6ca011c194283f959774c8dce9 Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
The logic that checks AQ events for NVM done events is better kept in nvm.c with the rest of the nvmupdate handling code. Change-ID: I2ea58980df8ecaa3726b28a37bff3dfcb8df03dc Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
Add opcodes and structures to support RSS configuration by PF driver on behalf of the VF drivers. This reduces complexity in the VF driver and allows us to support future hardware designs without modifying the VF driver. Change-ID: I8c75765c630eacb71f95967f1109a198542593ac Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
The NVM update status info should stay collected together, not spread across different structs. Change-ID: Ic16f9e9fd79945d865bb7226184c889884585025 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
The VFs can request their queues to be set up into polling mode, rather than interrupt mode, which works well for supporting things like DPDK, but this should not be available when working in an multi-function support device. Change-ID: Id36792e4e7422db8f2033336507211f68f14ff6f Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai Jain authored
This patch adds hook to support changing a VF from not-trusted to trusted and vice-versa. Fixed the wrappers and function prototype. Changed the dmesg to reflex the current state better. This patch also disables turning on/off trusted VF in MFP mode. Change-ID: Ibcd910935c01f0be1f3fdd6d427230291ee92ebe Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
As it turns out, calling into other files from hot path hurts performance a lot. In this case the majority of the time we call "check FCoE" and the packet is *not* FCoE, but this call was taking 5% of our total cycles spent on receive. Change-ID: I080552c26e7060bc7b78504dc2763f6f0b3d8c76 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
Some of the tx_ring arguments can be deleted since they are not used. Change-ID: I99275b0f191d7f63ec2f05061919904940c36f31 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
A local variable could move down inside the context where it is used. Change-ID: I9caba9e1eacf921037077f2665cbce83fd8e95d6 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Akeem G Abodunrin authored
This patch moves the HW flush routine to the end of the reset flow, after the completion of writing to the device VFLR registers- the benefit is to avoid problems in the passthrough routines. Change-ID: Ieb56866f21895e6c1fc514b7328c3df79807a57c Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
Don't set our internal debug_mask at startup unless we get specific signal to from the debug module parameter. This should take care of the issue with all the device capabilities getting printed even when we hadn't asked for the debug info. Change-ID: I7fbc6bd8b11ed9b0631ec018ff36015a04100b6c Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Deepthi Kavalur authored
Display MSIx vector count for HW capabilities. Change-ID: I4b41e9b50360cf660e7fbcb85b9390fedcf313b1 Signed-off-by: Deepthi Kavalur <deepthi.kavalur@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 06 Apr, 2016 6 commits
-
-
Stefan Assmann authored
Calling dev_close() causes IFF_UP to be cleared which will remove the interfaces routes and some addresses. That's probably not what the user intended when running the offline selftest. Besides this does not happen if the interface is brought down before the test, so the current behaviour is inconsistent. Instead call the net_device_ops ndo_stop function directly and avoid touching IFF_UP at all. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Introduce support for Data Center Bridging Ido says: This patchset introduces support for Quality of Service (QoS) as part of the IEEE Data Center Bridiging (DCB) standards. Patches 1-9 do the required device initialization. Specifically, patches 1-6 initialize the ports' headroom buffers, which are used at ingress to store incoming packets while they go through the switch's pipeline. Patches 7-9 complete them by initializing the egress scheduling. The pipeline mentioned above determines the packet's egress port(s) and traffic class. Ideally, once out of the pipeline the packet moves to the switch's shared buffer (to be introduced in Jiri's patchset, currently default values are used) and scheduled for transmission according to its traffic class. The egress scheduling is configured according to the 802.1Qaz standard, which is part of the DCB infrastructure supported by Linux. This is introduced in patches 10-12. Even after going through the pipeline packets are not always eligible to enter the shared buffer. This is determined by the amount of available space and the quotas associated with the packet. However, if flow control is enabled and the packet is associated with the lossless flow, then it will stay in the headroom and won't be discarded. This is introduced in patches 13-17. Please check individual commit messages for more info, as I tried to keep them pretty detailed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Implement the appropriate DCB ops and allow a user to configure certain traffic classes as lossless. The operation configures PFC for both the egress (respecting PFC frames) and ingress (sending PFC frames) parts of the port. At egress, when a PFC frame is received for a PFC enabled priority, then all the priorities mapped to the same TC are stopped. At ingress, the priority group (PG) buffers to which the enabled PFC priorities are mapped are configured to be lossless. PFC frames will be transmitted when the Xoff threshold is crossed. The user-supplied delay parameter is used to determine the PG's size according to the following formula: PG_SIZE = PG_SIZE_LOSSY + delay * CELL_FACTOR + MTU In the worst case scenario the delay will be made up of packets that are all of size CELL_SIZE + 1, which means each packet will require almost twice its true size when buffered in the switch. We therefore multiply this value by the "cell factor", which is close to 2. Another MTU is added in case the transmitting host already started transmitting a maximum length frame when the PFC packet was received. As with PAUSE enabled ports, when the port's MTU is changed both the PGs' size and threshold are adjusted accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
We are going to add support for PFC as part of DCB ops, which requires us to report the number of PFC frames sent and received per priority. Add per priority counters in order to report number of PFC frames sent and received per priority. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When a packet ingress the switch it's placed in its assigned priority group (PG) buffer in the port's headroom buffer while it goes through the switch's pipeline. After going through the pipeline - which determines its egress port(s) and traffic class - it's moved to the switch's shared buffer awaiting transmission. However, some packets are not eligible to enter the shared buffer due to exceeded quotas or insufficient space. Marking their associated PGs as lossless will cause the packets to accumulate in the PG buffer. Another reason for packets accumulation are complicated pipelines (e.g. involving a lot of ACLs). To prevent packets from being dropped a user can enable PAUSE frames on the port. This will mark all the active PGs as lossless and set their size according to the maximum delay, as it's not configured by user. +----------------+ + | | | | | | | | | | | | | | | | | | Delay | | | | | | | | | | | | | | | Xon/Xoff threshold +----------------+ + | | | | | | 2 * MTU | | | +----------------+ + The delay (612 [Cells]) was calculated according to worst-case scenario involving maximum MTU and 100m cables. After marking the PGs as lossless the device is configured to respect incoming PAUSE frames (Rx PAUSE) and generate PAUSE frames (Tx PAUSE) according to user's settings. Whenever the port's headroom configuration changes we take into account the PAUSE configuration, so that we correctly set the PG's type (lossy / lossless), size and threshold. This can happen when: a) The port's MTU changes, as it directly affects the PG's size. b) A PG is created following user configuration, by binding a priority to it. Note that the relevant SUPPORTED flags were already mistakenly set by the driver before this commit. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When configuring PAUSE frames and PFC we'll need to configure the Xon/Xoff threshold for the priority group (PG) buffers. Add the Xon/Xoff threshold fields to the PBMC register so that we can configure these when needed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-