- 18 Feb, 2016 40 commits
-
-
Florian Westphal authored
mmapped netlink has a number of unresolved issues: - TX zerocopy support had to be disabled more than a year ago via commit 4682a035 ("netlink: Always copy on mmap TX.") because the content of the mmapped area can change after netlink attribute validation but before message processing. - RX support was implemented mainly to speed up nfqueue dumping packet payload to userspace. However, since commit ae08ce00 ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy with the socket-based interface too (via the skb_zerocopy helper). The other problem is that skbs attached to mmaped netlink socket behave different from normal skbs: - they don't have a shinfo area, so all functions that use skb_shinfo() (e.g. skb_clone) cannot be used. - reserving headroom prevents userspace from seeing the content as it expects message to start at skb->head. See for instance commit aa3a0220 ("netlink: not trim skb for mmaped socket when dump"). - skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we crash because it needs the sk to check if a tx ring is attached. Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf35 ("netfilter: nfnetlink: use original skbuff when acking batches"). mmaped netlink also didn't play nicely with the skb_zerocopy helper used by nfqueue and openvswitch. Daniel Borkmann fixed this via commit 6bb0fef4 ("netlink, mmap: fix edge-case leakages in nf queue zero-copy")' but at the cost of also needing to provide remaining length to the allocation function. nfqueue also has problems when used with mmaped rx netlink: - mmaped netlink doesn't allow use of nfqueue batch verdict messages. Problem is that in the mmap case, the allocation time also determines the ordering in which the frame will be seen by userspace (A allocating before B means that A is located in earlier ring slot, but this also means that B might get a lower sequence number then A since seqno is decided later. To fix this we would need to extend the spinlocked region to also cover the allocation and message setup which isn't desirable. - nfqueue can now be configured to queue large (GSO) skbs to userspace. Queing GSO packets is faster than having to force a software segmentation in the kernel, so this is a desirable option. However, with a mmap based ring one has to use 64kb per ring slot element, else mmap has to fall back to the socket path (NL_MMAP_STATUS_COPY) for all large packets. To use the mmap interface, userspace not only has to probe for mmap netlink support, it also has to implement a recv/socket receive path in order to handle messages that exceed the size of an rx ring element. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jamal Hadi Salim authored
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When VLANs are created / destroyed on a VLAN filtering bridge (MASTER flag set), the configuration is passed down to the hardware. However, when only the flags (e.g. PVID) are toggled, the configuration is done in the software bridge alone. While it is possible to pass these flags to hardware when invoked with the SELF flag set, this creates inconsistency with regards to the way the VLANs are initially configured. Pass the flags down to the hardware even when the VLAN already exists and only the flags are toggled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Roese authored
Add code to select SGMII-to-copper mode upon SGMII interface selection. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alison Schofield authored
divamnt stores a start_time at module init and uses it to calculate elapsed time. The elapsed time, stored in secs and usecs, is part of the trace data the driver maintains for the DIVA Server ISDN cards. No change to the format of that time data is required. To avoid overflow on 32-bit systems use ktime_get_ts64() to return the elapsed monotonic time since system boot. This is a change from real to monotonic time. Since the driver only stores elapsed time, monotonic time is sufficient and more robust against real time clock changes. These new monotonic values can be more useful for debugging because they can be easily compared to other monotonic timestamps. Note elaspsed time values will now start at system boot time rather than module load time, so they will differ slightly from previously reported values. Remove declaration and init of previously unused time constants: start_sec, start_usec. Signed-off-by: Alison Schofield <amsfield22@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-02-17 This series contains updates to i40e/i40evf once again. Mitch updates the use of a define instead of a magic number. Adds support for packet split receive on VFs, which is disabled by default. Expands on a code comment which was not verbose or really helpful. Fixes an issue where if a reset fails to complete and was not properly setting the adapter state, which would cause a panic on rmmod, so set the adpater state to DOWN to avoid a panic. Jesse cleans up a "dump" in debugfs that never panned out to be useful. Anjali adds a workaround for cases where we might have interrupts that get lost but wright-back (WB) happened. Fixes an issue by falling back to enabling unicast, multicast and broadcast promiscuous mode when the driver must disable it's use of "default port" (defport mode) due to internal incompatibility with Multiple Function per Port (MFP). Fixes an issue where queues should never be enabled/disabled in the interrupt handler. Kiran cleans up th code which used hard coded base VEB SEID since it was removed from the specification. Shannon adds a few bits for better debug messages. Fixes an obscure corner case, where it was possible to clear the NVM update wait flag when no update_done message was actually received. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Ever since commit 04ed3e74 ("net: change netdev->features to u32") the format string fmt_long_hex has not been used, so we may as well remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Catherine Sullivan authored
Bump. Change-ID: Ie280dc67e37a1cf667c3469499a4fb90f4177b75 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai Jain authored
In MFP mode particularly when we were setting the PF VSI in limited promiscuous, the HW switch was still mirroring the outgoing packets from other VSIs (VF/VMdq) onto the PF VSI. With this new bit set, the mirroring doesn't happen any more and so we are in limited promiscuous on the PF VSI in MFP which is similar to defport. An API check is not required, since this bit is reserved for FW API version < 1.5 Also update copyright year in file headers. Change-ID: I9840cb95f11dde733d943cb03ce84f68b9611bc8 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
In one obscure corner case, it was possible to clear the NVM update wait flag when no update_done message was actually received. This patch cleans the event descriptor before use, and moves the opcode check to where it won't get done if there was no event to clean. Also update copyright year in file headers. Change-ID: I68bbc41965e93f4adf07cbe98b9dfd63d41509a4 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
If a reset fails to complete, the driver gets its affairs in order and awaits the cold solace of rmmod. Unfortunately, it was not properly setting the adapter state, which would cause a panic on rmmod, instead of the desired surcease. Set the adapter state to DOWN in this case, and avoid a panic. Change-ID: I6fdd9906da52e023f8dc744f7da44b5d95278ca9 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
Make sure we return EBUSY while finishing up a reset, and add a few bits for better debug messages. Change-ID: I23f6c28a8d96d7aa171abcc265737cec7826c292 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
Explain why we cannot remove this code, even though it works differently than any of our other interrupt cause handling code. Change-ID: Ie66203bd037a466066036611c31d44f759ec5176 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai Jain authored
The queues should never be enabled/disabled in the interrupt handler, ICR0 interrupt enable should be the only thing that needs to be dynamically changed in the handler. This patch fixes that. Without this patch X722 platforms were seeing weird ping timings when in Legacy mode since it takes a whole lot of time for the HW/FW to re-enable queues. Change-ID: If065afc45d81c5a19d4a94a00cd5b8f61cefc40c Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
In the case where we have a page fully used by receive data, we need to release the page fully to the stack. Instead of calling get_page (which increments the page count) followed by free_page (which decrements the page count), just donate our reference to the stack. Although this donation is not tax deductible, it does allow us to avoid two very expensive atomic operations that reverse each other. Change-ID: If70739792d5748995fc175ec92ac2171ed4ad8fc Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Kiran Patil authored
Fixed mapping of SEID is removed from specification. Hence this patch removes code which was using hard coded base VEB SEID. Changed FCoE code to use "hw->pf_id" to obtain correct "idx" and verified. Removed defines for BASE VSI/VEB SEID and BASE_PF_SEID since it is not used anymore. Change-ID: Id507cf4b1fae1c0145e3f08ae9ea5846ea5840de Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai Jain authored
This patch falls back to enabling unicast, multicast and broadcast promiscuous mode when the driver must disable it's use of "default port" aka defport mode (which is normally used to provide a promiscuous mode), due to internal incompatibility with Multiple Function per Port (aka MFP). The situation that requires this patch is when Physical Function 0 is the device being used, and it can support SR-IOV when MFP is enabled, via the driver creating a VEB on an MFP enabled adapter. Change-ID: Ie90b00d0d58782a5dfcf2c3c9725a2eb90bd63d8 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai Jain authored
This patch adds a workaround for cases where we might have interrupts that got lost but WB happened. If that happens without this patch we will see a tx_timeout. To work around it, this patch goes ahead and reschedules NAPI in that situation, if NAPI is not already scheduled. We also add a counter in ethtool to keep track of when we detect a case of tx_lost_interrupt. Note: napi_reschedule() can be safely called from process/service_task context and is done in other drivers as well without an issue. Change-ID: I00f98f1ce3774524d9421227652bef20fcbd0d20 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
This patch makes use of a pointer called hw consistent in the i40e_remove function. Change-ID: Idacc7ff0a09a68289c57457a78618bf5497de077 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
Support packet split receive on VFs. This is off by default but can be enabled using ethtool private flags. Because we need to trigger a reset from outside of i40evf_main.c, create a new function to do so, and export it. Also update copyright year in file headers. Change-ID: I721aa5d70113d3d6d94102e5f31526f6fc57cbbb Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
There was a completely unused file "dump" in debugfs that never panned out to be useful. Change-ID: I12bb9e37b5a83299725dda815a8746157baf6562 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
We have a define for this, use it. No functional change. Change-ID: Ic0e3ea4f562e46de63b2a8de07f291ccc10205fd Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
David S. Miller authored
Jiri Benc says: ==================== vxlan: clean up rx path, consolidating extension handling The rx path of VXLAN turned over time into kind of spaghetti code. The rx processing is split between vxlan_udp_encap_recv and vxlan_rcv but in an artificial way: vxlan_rcv is just called at the end of vxlan_udp_encap_recv, continuing the rx processing where vxlan_udp_encap_recv left it. There's no clear border between those two functions. It makes sense to combine those functions into one; this will be actually needed for VXLAN-GPE where we'll need to skip part of the processing which is hard to do with the current code. However, both functions are too long already. This patchset is shortening them, consolidating extension handling that is spread all around together and moving it to separate functions. (Later patchsets will do more consolidation in other parts of the functions with the final goal of merging vxlan_udp_encap_recv and vxlan_rcv.) In process of consolidation of the extension handling, I needed to deal with vni field in a generic way, as its lower 8 bits mean different things for different extensions. While cleaning up the code to strictly distinguish between "vni" and "vni field" (which contains vni plus an additional byte), I also converted the code not to convert endianess back and forth. The full picture can be seen at: https://github.com/jbenc/linux-vxlan/commits/master ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
For metadata based tunnels, VNI is ignored when doing vxlan device lookups (because such tunnel receives all VNIs). However, this was not honored by vxlan_xmit_one when doing encapsulation bypass. Move the check for metadata based tunnel to the common place where it belongs. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
When there are unrecognized flags present in the vxlan header, it doesn't make much sense to return the packet for further UDP processing, especially considering that for other invalid flag combinations we drop the packet because of previous checks. This means we return positive value only at the beginning of the function where tun_dst is not yet allocated. This allows us to get rid of the bad_flags and error jump labels. When we're dropping packet, we need to free tun_dst now. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Bring the extension handling to a single place and move the actual handling logic out of vxlan_udp_encap_recv as much as possible. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
To make vxlan_udp_encap_recv shorter and more comprehensible. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Part of the parameters is not needed. Simplify the caller of this function in preparation of making vxlan rx more comprehensible. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Prevent repeated conversions from and to network order in the fast path. To achieve this, define all flag constants in big endian order and store VNI as __be32. To prevent confusion between the actual VNI value and the VNI field from the header (which contains additional reserved byte), strictly distinguish between "vni" and "vni_field". Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Currently, pointer to the vxlan header is kept in a local variable. It has to be reloaded whenever the pskb pull operations are performed which usually happens somewhere deep in called functions. Create a vxlan_hdr function and use it to reference the vxlan header instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-02-17 This series contains updates to i40e/i40evf only (again). Jesse moves sync_vsi_filters() up in the service_task because it may need to request a reset, and we do not want to wait another round of service task time. Refactored the enable_icr0() in order to allow it to be decided by the caller whether the CLEARPBA (clear pending events) bit will be set while re-enabling the interrupt. Also provides the "Don't Give Up" patch, where the driver will keep polling trying to allocate receive buffers until it succeeds. This should keep all receive queues running even in the face of memory pressure. Cleans up the debugging helpers by putting everything in hex to be consistent. Neerav updates the DCB firmware version related checkes specific to X710 and XL710 only since the checks are not required for X722 devices. Shannon adds the use of the new shared MAC filter bit for multicast and broadcast filters in order to make better use of the filters available from the device. Added a parameter to allow the driver to set the enable/disable of statistics gathering in the hardware switch. Also the L2 cloud filtering parameter is removed since it was never used. Anjali refactors the force_wb and WB_ON_ITR functionality since Force-WriteBack functionality in X710/XL710 devices has been moved out of the clean routine and into the service task, so we need to make sure WriteBack-On-ITR is separated out since it is still called from clean. Catherine changes the VF driver string to reflect all the products that are supported. Mitch refactors the packet split receive code to properly use half-pages for receives. Also changes the use of bitwise operators to logical operators on clean_complete variable, while making a witty reference to Mr. Spock. Cleans up (i.e. removes) the hsplit field in the ring structure and use the existing macro to detect packet split enablement, which allows debugfs dumps of the VSI to properly show which recevie routine is in use. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== rocker: do world split This patchset allows new rocker worlds to be easily added in future. Two new worlds are now under development: P4 and eBPF. The main part of the patchset is the OF-DPA carve-out. It resuts in OF-DPA specific file. Clean cut. Note this patchset is based on my original attempt in October 2015. I had to rebase, included all suggestions and did lot of small changes. Main change to go with all-port-one-world approach. Port world is set according to what is setup in HW. Not possible to change worlds from driver. v1->v2: patch 12/13: - split port_init into pre-init and init ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Suggested-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Carve out OF-DPA would specific code from the common file to the world file. This change required struct rocker and struct rocker_port split into world specific struct ofdpa and struct ofdpa_port. Along with this the world specific functions and defines were renamed from prefix "rocker_" to "ofdpa_". Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
No need to push down rocker flags just to check if this is nowait or not. Let the caller handle that. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
The only purpose of passing this parameter is to check for prepare phase. The only reason for a failure in that state is if TLVs don't fit into descriptor. That is highly unlikely and if that happens, it is a driver bug. So remove this parameter from rocker_cmd_exec, and check for prepare phase in caller. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
This avoids need to alloc/free wait structure for every command call. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Be consistent with the rest of the setting functions, and pass "learning" as a bool function parameter. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
This is another step on the way to per-world clean cut. Introduce world ops hooks which each world can implement in world-specific way. Also introduce world infrastructure along with OF-DPA world stub. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
And take some other related thing along. They are going to be pushed into of-dpa part anyway. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-