Commits · 124905012db8bdeebf5a7d1ddc841eaadda84a75 · Kirill Smelkov / linux

31 Oct, 2016 40 commits

i40e: replace PTP Rx timestamp hang logic · 12490501

Jacob Keller authored Oct 05, 2016

The current Rx timestamp hang logic is not very robust because it does
not notice a register is hung until all four timestamps have been
latched and we wait a full 5 seconds. Replace this logic with a newer Rx
hang detection based on storing the jiffies when we first notice
a receive timestamp event. We store each register's time separately,
along with a flag indicating if it is currently latched. Upon first
transitioning to latch, we will update the latch_events[i] jiffies
value. This indicates the time we first noticed this event. The watchdog
routine will simply check that the either the flag has been cleared, or
we have passed at least one second. In this case, it is able to clear
the Rx timestamp register under the assumption that it was for a dropped
frame. The benefit if this strategy is that we should be able to
detect and clear out stalled RXTIME_H registers before we exhaust the
supply of 4, and avoid complete stall of Rx timestamp events.

Change-ID: Id55458c0cd7a5dd0c951ff2b8ac0b2509364131f
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

12490501

i40e: use a mutex instead of spinlock in PTP user entry points · 19551262

Jacob Keller authored Oct 05, 2016

We need a locking mechanism to protect the hardware SYSTIME register
which is split over 2 values, and has internal hardware latching. We
can't allow multiple accesses at the same time. However....

The spinlock_t is overkill here, especially use of spin_lock_irqsave,
since every PTP access will halt hardirqs. Notice that the only places
which need the SYSTIME value are user context and are capable of sleeping.
Thus, it is safe to use a mutex here instead of the spinlock.

Change-ID: I971761a89b58c6aad953590162e85a327fbba232
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

19551262

i40e: correct check for reading TSYNINDX from the receive descriptor · 144ed176

Jacob Keller authored Oct 05, 2016

When hardware has taken a timestamp for a received packet, it indicates
which RXTIME register the timestamp was placed in by some bits in the
receive descriptor. It uses 3 bits, one to indicate if the descriptor
index is valid (ie: there was a timestamp) and 2 bits to indicate which
of the 4 registers to read. However, the driver currently does not check
the TSYNVALID bit and only checks the index. It assumes a zero index
means no timestamp, and a non zero index means a timestamp occurred.
While this appears to be true, it prevents ever reading a timestamp in
RXTIME[0], and causes the first timestamp the device captures to be
ignored.

Fix this by using the TSYNVALID bit correctly as the true indicator of
whether the packet has an associated timestamp.

Also rename the variable rsyn to tsyn as this is more descriptive and
matches the register names.

Change-ID: I4437e8f3a3df2c2ddb458b0fb61420f3dafc4c12
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

144ed176

i40e: remove duplicate add/delete adminq command code for filters · 00936319

Jacob Keller authored Oct 05, 2016

We duplicate some code around adding and deleting filters using the
adminq interface. This is prone to errors in case there are bugs. Use
functions which extract the logic to their own portion so that we don't
duplicate it twice in code.

Change-ID: I60d68aeb887976787dec00b23ab386a106e61465
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

00936319

i40e: avoid looping to check whether we're in VLAN mode · cbebb85f

Jacob Keller authored Oct 05, 2016

We determine that a VSI is in vlan_mode whenever it has any filters
with a VLAN other than -1 (I40E_VLAN_ALL). The previous method of doing
so was to perform a loop whenever we needed the check. However, we can
notice that only place where filters are added (i40e_add_filter) can
change the condition from false to true, and the only place we can
return to false is in i40e_vsi_sync_filters_subtask. Thus, we can remove
the loop and use a boolean directly.

Doing this avoids looping over filters repeatedly especially while we're
already inside a loop over all the filters. This should reduce the
latency of filter operations throughout the driver.

Change-ID: Iafde08df588da2a2ea666997d05e11fad8edc338
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

cbebb85f

i40e: fix MAC filters when removing VLANs · 84f5ca6c

Alan Brady authored Oct 05, 2016

Currently there exists a bug where adding at least one VLAN and then
removing all VLANs leaves the mac filters for the VSI with an incorrect
value for 'vid' which indicates the mac filter's VLAN status.

The current implementation for handling the removal of VLANs is wrong
for a couple reasons. The first is that when i40e_vsi_kill_vlan
iterates through the MAC filters, it fails to account for the MAC filter
status; i.e. it's not accommodating for filters that are about to be
deleted. The second problem is that MAC filters can be deleted in other
places (specifically i40e_set_rx_mode). Thus if it occurs that all the
VLAN MAC filters get deleted we need to switch out of VLAN mode, but the
code path through i40e_vsi_kill_vlan has already been executed and we're
now stuck in VLAN mode.

This patch fixes the issue by removing the check from i40e_vsi_kill_vlan
and puts the check instead in i40e_sync_vsi_filters where we're
guaranteed to see all filter deletions and can properly detect when we
need to switch out of VLAN mode.

Change-ID: Ib38fe6034b356eee9a0e20b8a9eeed5ff2debcd9
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

84f5ca6c

i40e: properly cleanup on allocation failure in i40e_sync_vsi_filters · 4a2ce27b

Jacob Keller authored Oct 05, 2016

Currently, we fail to correctly restore filters on the temporary add
list when we fail to allocate memory either for deletion or addition.
Replace calls to "goto out;" with calls to a new location that correctly
handles memory allocation failures.

Note that it is safe for us to call i40e_undo_filter_entries on the
tmp_del_list even after we've deleted filters because at this point it
will be empty, so we don't need to separate the logic for add and
delete failure.

Change-Id: Iee107fd219c6e03e2fd9645c2debf8e8384a8521
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

4a2ce27b

i40e: store MAC/VLAN filters in a hash with the MAC Address as key · 278e7d0b

Jacob Keller authored Oct 05, 2016

Replace the mac_filter_list with a static size hash table of 8bits. The
primary advantage of this is a decrease in latency of operations related
to searching for specific MAC filters, including .set_rx_mode. Using
a linked list resulted in several locations which were O(n^2). Using
a hash table should give us latency growth closer to O(n*log(n)).

Change-ID: I5330bd04053b880e670210933e35830b95948ebb
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>

278e7d0b

i40e: implement __i40e_del_filter and use where applicable · 290d2557

Jacob Keller authored Oct 05, 2016

When inside a loop where we call i40e_del_filter we use an O(n^2)
pattern where i40e_del_filter calls i40e_find_filter for us. We can
avoid this O(n^2) logic by factoring a function, __i40e_del_filter() out
from the i40e_del_filter code. This allows us to re-use the delete logic
where appropriate without having to search for the filter twice.

This new function benefits several functions including i40e_vsi_add_vlan,
i40e_vsi_kill_vlan, i40e_del_mac_vlan_all, and i40e_vsi_release.

Change-ID: I75fabe0f53bf73f56b80d342e5fdcfcc28f4d3eb
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

290d2557

i40e: When searching all MAC/VLAN filters, ignore removed filters · 57b341d6

Jacob Keller authored Oct 05, 2016

When adding new MAC address filters, the driver determines if it should
behave in VLAN mode (where all MAC addresses get assigned to every
existing VLAN) or in non-VLAN mode where MAC addresses get assigned the
VLAN_ANY identifier. Under some circumstances it is possible that a VLAN
has been marked for removal (such that all filters of that VLAN are set
to I40E_FILTER_REMOVE), and a subsequent call to i40e_put_mac_in_vlan
may occur prior to the driver subtask that syncs filters to the
hardware.

In this case, we may add filters to the new removed VLAN, even though it
should have been removed. This is most obvious when first adding a new
VLAN. We will delete all filters which are in I40E_VLAN_ANY (-1) and
then re-add them as in VLAN 0 (untagged). Then before we sync filters,
we will add new MAC address filter, which will be added to every VLAN
that exists. Unfortunately, this will include I40E_VLAN_ANY, so we will
end up incorrectly adding filters to the -1 VLAN. This can be fixed by
simply skipping all filters which are marked for removal.

A similar check is not necessary in i40e_del_mac_all_vlan, since we are
deleting, and any filter which we find already marked for removal would
simply be deleted again, which doesn't cause any issues.

Change-Id: I7962154013ce02fe950584690aeeb3ed853d0086
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

57b341d6

i40e: refactor i40e_put_mac_in_vlan to avoid changing f->vlan · 5feb3d7b

Jacob Keller authored Oct 05, 2016

When a PVID has been assigned to a VSI, the function
i40e_put_mac_in_vlan arbitrarily modifies all filters
to have the same VLAN. This is obviously incorrect
because it could be modifying active filters without
putting them into the NEW state. The correct method
is to remove then re-add filters which is already done
in the code where we assign the PVID.

Fix this issue and a few other minor nits at the same
time. First, when we have a PVID don't even bother
looping and simply add the filter with the PVID immediately.

In the case of the loop, we now can remove several checks.
We also don't need to use i40e_find_filter first before
calling i40e_add_filter, since i40e_add_filter implicitly
does a lookup already.

Finally, update the return semantics of this function so
that on failure to add a filter it returns NULL, but on
success, it returns the last filter added. Otherwise,
we're just returning the last filter in the list. An
alternative fix might be to return 0 or an error code,
but this is pretty invasive to every call site.

Change-ID: I2325dfd843aec76d89fb0d7cb0e7c4f290a34840
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

5feb3d7b

i40e: move i40e_put_mac_in_vlan and i40e_del_mac_all_vlan · 35ec2ff3

Jacob Keller authored Oct 05, 2016

A future patch will be modifying these functions and making a call to
a static function which currently is defined after these functions. Move
them in a separate patch to ease review and ensure the moved code is
correct.

Change-ID: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

35ec2ff3

i40e: make use of __dev_uc_sync and __dev_mc_sync · 6622f5cd

Jacob Keller authored Oct 05, 2016

The kernel provides __dev_uc_sync and __dev_mc_sync in order for drivers
which need individual notification of add and delete for each filter.
These functions allow us to vastly simplify our .set_rx_mode handler. We
need to implement two functions for sync and unsync which add and remove
filters respectively.

This change avoids a very complex and inefficient algorithm which
resulted in an abnormal latency for the .set_rx_mode NDO operation. The
resulting code after this change is more readable, more efficient, and
less code.

Due to the callback signature used by these functions we also must
update several other functions to take a const u8 * pointer.

Change-Id: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6622f5cd

i40e: drop is_vf and is_netdev fields in struct i40e_mac_filter · 1bc87e80

Jacob Keller authored Oct 05, 2016

Originally the is_vf and is_netdev fields were added in order to
distinguish between VF and netdev filters in a single VSI. However, it
can be noted that we use separate VSI for SRIOV VFs and for netdev VSI.
Thus, since a single VSI should only ever have one type of filter, we
can simply remove the checks and remove the typing.

In a similar fashion, we can note that the only remaining way to get
multiple filters of a single type is through a debug command that was
added to debugfs. This command is useless in practice, and results in
causing bugs if we keep counter tracking but lose the is_vf and
is_netdev protections as desired above.

Since the only time we'd actually have a counter value besides 0 and
1 is through use of this debugfs hook, we can remove this unnecessary
command, and the entire counter logic it required.

We vastly simplify mac filters by removing

(a) the distinction between VF and netdev filters
(b) counting logic
(c) the ability to add and remove filters bypassing the stack via debugfs

Change-ID: Idf916dd2a1159b1188ddbab5bef6b85ea6bf27d9
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

1bc87e80

i40e: Add missing \n to end of dev_err message · ff00f3a9

Colin Ian King authored Oct 04, 2016

Trival fix, dev_err message is missing a \n, so add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ff00f3a9

Merge branch 'bridge-PIM-hello' · 17a032b7

David S. Miller authored Oct 31, 2016

Nikolay Aleksandrov says:

====================
bridge: add support for PIM hello router ports

The first 3 patches of this set do minor cleanups and add some helpers to
the PIM header file. Patch 4 adds a way to detect mcast router ports via
PIM hello messages, they're marked as temporary and are not considered for
querier. There's more detailed information in patch 4's commit message.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

17a032b7

bridge: mcast: add router port on PIM hello message · 91b02d3d

Nikolay Aleksandrov authored Oct 31, 2016

When we receive a PIM Hello message on a port we can consider that it
has a multicast router attached, thus it is correct to add it to the
router list. The only catch is it shouldn't be considered for a querier.

Using Daniel's description:
leaf-11  leaf-12  leaf-13
       \   |    /
        bridge-1
         /    \
    host-11  host-12

 - all ports in bridge-1 are in a single vlan aware bridge
 - leaf-11 is the IGMP querier
 - leaf-13 is the PIM DR
 - host-11 TXes packets to 226.10.10.10
 - bridge-1 only forwards the 226.10.10.10 traffic out the port to
   leaf-11, it should also forward this traffic out the port to leaf-13
Suggested-by: Daniel Walton <dwalton@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91b02d3d

net: pim: add all RFC7761 message types · 56245cae

Nikolay Aleksandrov authored Oct 31, 2016

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56245cae

net: pim: add a helper to check for IPv4 all pim routers address · 20bb6ce9

Nikolay Aleksandrov authored Oct 31, 2016

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20bb6ce9

net: pim: add common pimhdr struct and helpers · 556d299f

Nikolay Aleksandrov authored Oct 31, 2016

Add the common pimhdr structure and helpers to access it, also cleanup the
format of the header file.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

556d299f

Merge branch 'qed-next' · f3a6f592

David S. Miller authored Oct 31, 2016

Yuval Mintz says:

====================
qed*: Patch series

This series does several things. The bigger changes:

 - Add new notification APIs [& Defaults] for various fields.
The series then utilizes some of those qed <-> qede APIs to bass WoL
support upon.

 - Change the resource allocation scheme to receive the values from
management firmware, instead of equally sharing resources between
functions [that might not need those]. That would, e.g., allow us to
configure additional filters to network interfaces in presence of
storage [PCI] functions from same adapter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f3a6f592

qed: Learn resources from management firmware · 2edbff8d

Tomer Tayar authored Oct 31, 2016

Currently, each interfaces assumes it receives an equal portion
of HW/FW resources, but this is wasteful - different partitions
[and specifically, parititions exposing different protocol support]
might require different resources.

Implement a new resource learning scheme where the information is
received directly from the management firmware [which has knowledge
of all of the functions and can serve as arbiter].
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2edbff8d

qed: Use VF-queue feature · 5a1f965a

Mintz, Yuval authored Oct 31, 2016

Driver sets several restrictions about the number of supported VFs
according to available HW/FW resources.
This creates a problem as there are constellations which can't be
supported [as limitation don't accurately describe the resources],
as well as holes where enabling IOV would fail due to supposed
lack of resources.

This introduces a new interal feature - vf-queues, which would
be used to lift some of the restriction and accurately enumerate
the queues that can be used by a given PF's VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a1f965a

qed: Learn of RDMA capabilities per-device · 6927e826

Mintz, Yuval authored Oct 31, 2016

Today, RDMA capabilities are learned from management firmware
which provides a per-device indication for all interfaces.
Newer management firmware is capable of providing a per-device
indication [would later be extended to either RoCE/iWARP].

Try using this newer learning mechanism, but fallback in case
management firmware is too old to retain current functionality.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6927e826

qede: Decouple ethtool caps from qed · d7455f6e

Mintz, Yuval authored Oct 31, 2016

While the qed_lm_maps is closely tied with the QED_LM_* defines,
when iterating over the array use actual size instead of the qed
define to prevent future possible issues.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7455f6e

qed*: Add support for WoL · 14d39648

Mintz, Yuval authored Oct 31, 2016

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

14d39648

qed: Add nvram selftest · 7a4b21b7

Mintz, Yuval authored Oct 31, 2016

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a4b21b7

qed*: Management firmware - notifications and defaults · 0fefbfba

Sudarsana Kalluru authored Oct 31, 2016

Management firmware is interested in various tidbits about
the driver - including the driver state & several configuration
related fields [MTU, primtary MAC, etc.].
This adds the necessray logic to update MFW with such configurations,
some of which are passed directly via qed while for others APIs
are provide so that qede would be able to later configure if needed.

This also introduces a new default configuration for MTU which would
replace the default inherited by being an ethernet device.
Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fefbfba

solos-pci: use permission-specific DEVICE_ATTR variants · 89d9123e

Julia Lawall authored Oct 29, 2016

Use DEVICE_ATTR_RW for read-write attributes.  This simplifies the
source code, improves readbility, and reduces the chance of
inconsistencies.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@rw@
declarer name DEVICE_ATTR;
identifier x,x_show,x_store;
@@

DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store);

@script:ocaml@
x << rw.x;
x_show << rw.x_show;
x_store << rw.x_store;
@@

if not (x^"_show" = x_show && x^"_store" = x_store)
then Coccilib.include_match false

@@
declarer name DEVICE_ATTR_RW;
identifier rw.x,rw.x_show,rw.x_store;
@@

- DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store);
+ DEVICE_ATTR_RW(x);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

89d9123e

ptp: use permission-specific DEVICE_ATTR variants · 63215705

Julia Lawall authored Oct 29, 2016

Use DEVICE_ATTR_RO for read only attributes.  This simplifies the
source code, improves readbility, and reduces the chance of
inconsistencies.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@ro@
declarer name DEVICE_ATTR;
identifier x,x_show;
@@

DEVICE_ATTR(x, \(0444\|S_IRUGO\), x_show, NULL);

@script:ocaml@
x << ro.x;
x_show << ro.x_show;
@@

if not (x^"_show" = x_show) then Coccilib.include_match false

@@
declarer name DEVICE_ATTR_RO;
identifier ro.x,ro.x_show;
@@

- DEVICE_ATTR(x, \(0444\|S_IRUGO\), x_show, NULL);
+ DEVICE_ATTR_RO(x);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63215705

bpf, inode: add support for symlinks and fix mtime/ctime · 0f98621b

Daniel Borkmann authored Oct 29, 2016

While commit bb35a6ef ("bpf, inode: allow for rename and link ops")
added support for hard links that can be used for prog and map nodes,
this work adds simple symlink support, which can be used f.e. for
directories also when unpriviledged and works with cmdline tooling that
understands S_IFLNK anyway. Since the switch in e27f4a94 ("bpf: Use
mount_nodev not mount_ns to mount the bpf filesystem"), there can be
various mount instances with mount_nodev() and thus hierarchy can be
flattened to facilitate object sharing. Thus, we can keep bpf tooling
also working by repointing paths.

Most of the functionality can be used from vfs library operations. The
symlink is stored in the inode itself, that is in i_link, which is
sufficient in our case as opposed to storing it in the page cache.
While at it, I noticed that bpf_mkdir() and bpf_mkobj() don't update
the directories mtime and ctime, so add a common helper for it called
bpf_dentry_finalize() that takes care of it for all cases now.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f98621b

ldmvsw: tx queue stuck in stopped state after LDC reset · 8778b276

Aaron Young authored Oct 28, 2016

The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.

The root cause was determined to be from the following series of
events:

1. Guest domain panics - resulting in the guest no longer processing
   network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
   control due to no more available tx drings and stops the tx queue
   for the guest domain
3. The LDC of the network connection for the guest is reset when
   the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
   responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
   the normal method to re-enable the tx queue. But the ACK never comes
   because the tx queue was cleared due to the LDC reset.

To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8778b276

Merge branch 'xps-DCB' · 04f762e8

David S. Miller authored Oct 31, 2016

Alexander Duyck says:

====================
Add support for XPS when using DCB

This patch series enables proper isolation between traffic classes when
using XPS while DCB is enabled.  Previously enabling XPS would cause the
traffic to be potentially pulled from one traffic class into another on
egress.  This change essentially multiplies the XPS map by the number of
traffic classes and allows us to do a lookup per traffic class for a given
CPU.

To guarantee the isolation I invalidate the XPS map for any queues that are
moved from one traffic class to another, or if we change the number of
traffic classes.

v2: Added sysfs to display traffic class
    Replaced do/while with for loop
    Cleaned up several other for for loops throughout the patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

04f762e8

net: Add support for XPS with QoS via traffic classes · 184c449f

Alexander Duyck authored Oct 28, 2016

This patch adds support for setting and using XPS when QoS via traffic
classes is enabled.  With this change we will factor in the priority and
traffic class mapping of the packet and use that information to correctly
select the queue.

This allows us to define a set of queues for a given traffic class via
mqprio and then configure the XPS mapping for those queues so that the
traffic flows can avoid head-of-line blocking between the individual CPUs
if so desired.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

184c449f

net: Refactor removal of queues from XPS map and apply on num_tc changes · 6234f874

Alexander Duyck authored Oct 28, 2016

This patch updates the code for removing queues from the XPS map and makes
it so that we can apply the code any time we change either the number of
traffic classes or the mapping of a given block of queues. This way we
avoid having queues pulling traffic from a foreign traffic class.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6234f874

net: Add sysfs value to determine queue traffic class · 8d059b0f

Alexander Duyck authored Oct 28, 2016

Add a sysfs attribute for a Tx queue that allows us to determine the
traffic class for a given queue. This will allow us to more easily
determine this in the future. It is needed as XPS will take the traffic
class for a group of queues into account in order to avoid pulling traffic
from one traffic class into another.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d059b0f

net: Move functions for configuring traffic classes out of inline headers · 9cf1f6a8

Alexander Duyck authored Oct 28, 2016

The functions for configuring the traffic class to queue mappings have
other effects that need to be addressed. Instead of trying to export a
bunch of new functions just relocate the functions so that we can
instrument them directly with the functionality they will need.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cf1f6a8

driver: tun: Use new macro SOCK_IOC_TYPE instead of literal number 0x89 · 20861f26

Gao Feng authored Oct 27, 2016

The current codes use _IOC_TYPE(cmd) == 0x89 to check if the cmd is one
socket ioctl command like SIOCGIFHWADDR. But the literal number 0x89 may
confuse readers. So create one macro SOCK_IOC_TYPE to enhance the readability.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20861f26

net: add an ioctl to get a socket network namespace · c62cce2c

Andrey Vagin authored Oct 24, 2016

Each socket operates in a network namespace where it has been created,
so if we want to dump and restore a socket, we have to know its network
namespace.

We have a socket_diag to get information about sockets, it doesn't
report sockets which are not bound or connected.

This patch introduces a new socket ioctl, which is called SIOCGSKNS
and used to get a file descriptor for a socket network namespace.

A task must have CAP_NET_ADMIN in a target network namespace to
use this ioctl.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c62cce2c

mv643xx_eth: Properly resolve merge conflict. · 2a43ca0a

David S. Miller authored Oct 31, 2016

The second SET_NETDEV_DEV() in the hunk should be
removed.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a43ca0a