Commits · 883ce97d25b019ce8437ba6f49e38302ca5ec23f · Kirill Smelkov / linux

17 Feb, 2016 2 commits

bnx2x: Add Geneve inner-RSS support · 883ce97d

Yuval Mintz authored Feb 16, 2016

This adds the ability to perform RSS hashing based on encapsulated
headers for a geneve-encapsulated packet.

This also changes the Vxlan implementation in bnx2x to be uniform
for both vxlan and geneve [from configuration perspective].
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

883ce97d

bnx2x: Remove unneccessary EXPORT_SYMBOL · 44520464

Yuval Mintz authored Feb 16, 2016

bnx2x_schedule_sp_rtnl is exported by bnx2x, although no other module
uses it.
Reported-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

44520464

16 Feb, 2016 25 commits

tipc: refactor node xmit and fix memory leaks · 4952cd3e

Richard Alpe authored Feb 11, 2016

Refactor tipc_node_xmit() to fail fast and fail early. Fix several
potential memory leaks in unexpected error paths.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4952cd3e

dmascc: Return correct error codes · 37ace20a

Amitoj Kaur Chawla authored Feb 10, 2016

This change has been made with the goal that kernel functions should
return something more descriptive than -1 on failure.

A variable `err` has been introduced for storing error codes.

The return value of kzalloc on failure should return a -1 and not a
-ENOMEM. This was found using Coccinelle. A simplified version of
the semantic patch used is:

//<smpl>
@@
expression *e;
identifier l1;
@@

e = kzalloc(...);
if (e == NULL) {
...
goto l1;
}
l1:
...
return -1
+ -ENOMEM
;
//</smpl

Furthermore, set `err` to -ENOMEM on failure of alloc_netdev(), and to
-ENODEV on failure of register_netdev() and probe_irq_off().

The single call site only checks that the return value is not 0,
hence no change is required at the call site.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37ace20a

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 31d035a0

David S. Miller authored Feb 16, 2016

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2016-02-15

This series contains updates to igb only.

Shota Suzuki cleans up unnecessary flag setting for 82576 in
igb_set_flag_queue_pairs() since the default block already sets
IGB_FLAG_QUEUE_PAIRS to the correct value anyways, so the e1000_82576
code block is not necessary and we can simply fall through.  Then fixes
an issue where IGB_FLAG_QUEUE_PAIRS can now be set by using "ethtool -L"
option but is never cleared unless the driver is reloaded, so clear the
queue pairing if the pairing becomes unnecessary as a result of "ethtool
-L".

Mitch fixes the igbvf from giving up if it fails to get the hardware
mailbox lock.  This can happen when the PF-VF communication channel is
heavily loaded and causes complete communications failure between the
PF and VF drivers, so add a counter and a delay so that the driver will
now retry ten times before giving up on getting the mailbox lock.

The remaining patches in the series are from Alex Duyck, starting with the
cleaning up code that sets the MAC address.  Then refactors the VFTA and
VLVF configuration, to simplify and update to similar setups in the ixgbe
driver.  Fixed an issue were VLANs headers size was being added to the
value programmed into the RLPML registers, yet these registers already
take into account the size of the VLAN headers when determining the
maximum packet length, so we can drop the code that adds the size to
the RLPML registers.  Cleaned up the configuration of the VF port based
VLAN configuration.  Also fixed the igb driver so that we can fully
support SR-IOV or the recently added NTUPLE filtering while allowing
support for VLAN promiscuous mode.  Also added the ability to use the
bridge utility to add a FDB entry for the PF to an igb port.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

31d035a0

Merge branch 'ethtool-channels-rxfh-conflict' · 9a14b1c2

David S. Miller authored Feb 16, 2016

Jacob Keller says:

====================
ethtool: correct {GS}CHANNELS and {GS}RXFH conflict

This patch series fixes up ethtool_set_channels operation which
allowed modifying the RXFH table indirectly by reducing the number of
queues below the current max queue used by the Rx flow table. Most
drivers incorrectly allowed this to destroy the Rx flow table and
would then start by reinitializing it to default settings. However,
drivers are not able to correctly handle the conflict since there was
no way to differentiate between the default settings and the user
requested explicit settings.

To fix this, implement a new netdev private flag which we use to
indicate whether the RXFH has been user configured. If someone has
a better alternative of how to store this information, let me know.
I am not sure that priv_flags is the best solution but I have not had
any better idea.

Secondly, we add a function which just calls the driver's get_rxfh
callback to determine the current indirection table. Loop through this
and we can determine the current highest queue that will be used by
RSS.

Now, modify ethtool_set_channels to add a check ensuring that if (a)
we have had rxfh configured by user, (b) we can get the maximum RSS
queue currently used, then we ensure that the newly requested Rx count
(or combined count) is at least as high as this maximum RSS queue. The
reasoning here is that we can always safely increase the number of
queues. If we decrease the queues we must ensure that the decrease
does not go lower than the highest in-use queue for the Rx flow table.

Drivers may still need to be patched if they currently overwrite the
Rx flow table during channel configuration. If the driver currently
always resets Rx flow table when increasing number of queues it must
be patched to only do this when netif_is_rxfh_configured returns
false.

The second patch simply adds a check to ensure that all provided
channel counts fit within driver defined maximums.

The third patch fixes fm10k to correctly reconfigure the RSS reta
table whenever it is still unconfigured. This means that the default
state will provide RSS to every queue. Once the user has configured
RXFH, then we should maintain it. In addition, since the case where we
must reconfigure the RSS table in this case should now no longer
occur, add a dev_err message to indicate the user that we did so.

I have also supplied an ethtool patch to enable setting the default Rx
flow indirection table. Without this, current ethtool does not support
sending an indir_size of 0, and thus does not correctly support
configuring back to the default.

Changes in v2:
* fixed compile error
* fixed incorrect comparison with max_rx_in_use
* adjusted looping over dev_size
* removed inline on function
* dropped patch about separating combined vs asymmetric channels
* verified behavior using fm10k driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9a14b1c2

fm10k: don't reinitialize RSS flow table when RXFH configured · 1012014e

Keller, Jacob E authored Feb 08, 2016

Also print an error message incase we do have to reconfigure as this
should no longer happen anymore due to ethtool changes. If it somehow
does occur, user should be made aware of it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1012014e

ethtool: ensure channel counts are within bounds during SCHANNELS · 8bf36862

Keller, Jacob E authored Feb 08, 2016

Add a sanity check to ensure that all requested channel sizes are within
bounds, which should reduce errors in driver implementation.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8bf36862

ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH} · d4ab4286

Keller, Jacob E authored Feb 08, 2016

Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops
incorrectly allow SCHANNELS when it would conflict with the settings
from SRXFH. This occurs because it is not possible for drivers to
understand whether their Rx flow indirection table has been configured
or is in the default state. In addition, drivers currently behave in
various ways when increasing the number of Rx channels.

Some drivers will always destroy the Rx flow indirection table when this
occurs, whether it has been set by the user or not. Other drivers will
attempt to preserve the table even if the user has never modified it
from the default driver settings. Neither of these situation is
desirable because it leads to unexpected behavior or loss of user
configuration.

The correct behavior is to simply return -EINVAL when SCHANNELS would
conflict with the current Rx flow table settings. However, it should
only do so if the current settings were modified by the user. If we
required that the new settings never conflict with the current (default)
Rx flow settings, we would force users to first reduce their Rx flow
settings and then reduce the number of Rx channels.

This patch proposes a solution implemented in net/core/ethtool.c which
ensures that all drivers behave correctly. It checks whether the RXFH
table has been configured to non-default settings, and stores this
information in a private netdev flag. When the number of channels is
requested to change, it first ensures that the current Rx flow table is
not going to assign flows to now disabled channels.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4ab4286

net: fec: Add "phy-reset-active-low" property to DT · 64f10f6e

Bernhard Walle authored Feb 08, 2016

We need that for a custom hardware that needs the reverse reset
sequence.
Signed-off-by: Bernhard Walle <bernhard@bwalle.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

64f10f6e

Merge branch 'bcm7xxx-cleanups' · 12d6b917

David S. Miller authored Feb 16, 2016

Florian Fainelli says:

====================
net: phy: bcm7xxx: Misc cleanups

These two patches are cleanups to the BCM7xxx internal PHY driver:

- fix a constant name missing a X (as in BCM7XXX)
- add a macro to reduce the amount of code duplication to add new entries
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

12d6b917

net: phy: bcm7xxx: Reduce boilerplate code for 40nm EPHY · 3125c081

Florian Fainelli authored Feb 06, 2016

Introduce a macro which helps adding new 40NM EPHY entries and reduces the
amount of boilerplate code.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3125c081

net: phy: bcm7xxx: Make MII_BCM7XX_64CLK_MDIO naming consistent · 3ccc3055

Florian Fainelli authored Feb 06, 2016

The driver is BCM7xxx, we were missing an additional X in the constant naming,
fix that to be consistent.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ccc3055

igb: Add workaround for VLAN tag stripping on 82576 · bf456abb

Alexander Duyck authored Jan 06, 2016

There was a workaround partially implemented for the 82576 that is needed
in order for VLAN tag stripping to function correctly. The original code
had side effects that would make it so the workaround was active on all
MACs. I have updated the code so that the workaround is enabled, but
limited to the 82576, or activated if we exceed the available unicast
addresses.

The workaround has a side effect of mirroring all of the traffic outgoing
from the VFs back to the PF. As such it is not recommended to use the
82576 in promiscuous mode as it will take a performance hit, though this is
now consistent with the performance as seen on the out-of-tree igb driver.

I also limited the scope of the UTA bits all being set to only when the
VMOLR register is enabled. This should limit the effects of the UTA
register so that we don't pick up any excess traffic unless promiscuous
mode has been enabled on the PF, whereas before the PF would have ended up
in something equivalent to unicast promiscuous mode with VLAN filtering
otherwise.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

bf456abb

igb: Enable use of "bridge fdb add" to set unicast table entries · 268f9d33

Alexander Duyck authored Jan 06, 2016

This change makes it so that we can use the bridge utility to add a FDB
entry for the PF to an igb port.  By doing this we can enable the VFs to
talk to virtual ports residing on top of the PF.

In addition this should also address issues with MACVLANs trying to reside
on top of the PF as well as they would have had similar issues when added
to the PF with SR-IOV enabled.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

268f9d33

igb: Drop unnecessary checks in transmit path · 9c2f186e

Alexander Duyck authored Jan 06, 2016

This patch drops several checks that we dropped from ixgbe some ago. It
should not be possible for us to be called with either of the conditional
statements returning true so we can just drop them from the hot-path.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

9c2f186e

igb: Add support for VLAN promiscuous with SR-IOV and NTUPLE · 16903caa

Alexander Duyck authored Jan 06, 2016

This change fixes things so that we can fully support SR-IOV or the
recently added NTUPLE filtering while allowing support for VLAN promiscuous
mode. By making this change we are able to support possible scenarios such
as SR-IOV with the PF connected to a Linux bridge hosting other VMs.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

16903caa

igb: Clean-up configuration of VF port VLANs · a15d9259

Alexander Duyck authored Jan 06, 2016

This patch is meant to clean-up the configuration of the VF port based VLAN
configuration. The original logic was a bit muddled and had some
undesirable side effects such as VLANs being either completely stripped
from the port or VLANs being left when they shouldn't be. The idea behind
this code is to avoid any events such as spurious spoof notifications when
we are removing one VLAN tag and replacing it with another.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

a15d9259

igb: Merge VLVF configuration into igb_vfta_set · 8b77c6b2

Alexander Duyck authored Jan 06, 2016

This change makes it so that we can merge the configuration of the VLVF
registers into the setting of the VFTA register. By doing this we simplify
the logic and make use of similar functionality that we have already added
for ixgbe making it easier to maintain both drivers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

8b77c6b2

igb: Always enable VLAN 0 even if 8021q is not loaded · 5982a556

Alexander Duyck authored Jan 06, 2016

This patch makes it so that we always add VLAN 0. This is important as we
need to guarantee the PF can receive untagged frames in the case of SR-IOV
being enabled but VLAN filtering not being enabled in the kernel.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

5982a556

igb: Do not factor VLANs into RLPML calculation · d3836f8e

Alexander Duyck authored Jan 06, 2016

The RLPML registers already take the size of VLAN headers into account when
determining the maximum packet length. This is called out in EAS documents
for several parts including the 82576 and the i350. As such we can drop
the addition of size to the value programmed into the RLPML registers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

d3836f8e

igb: Allow asymmetric configuration of MTU versus Rx frame size · 45693bcb

Alexander Duyck authored Jan 06, 2016

Since the igb driver is using page based receive there is no point in
limiting the Rx capabilities of the device.  The driver can receive 9K
jumbo frames at all times.  The only changes needed due to MTU changes are
updates for the FIFO sizes and flow-control watermarks.

Update the maximum frame size to reflect the 9.5K limitation of the
hardware, and replace all instances of max_frame_size with
MAX_JUMBO_FRAME_SIZE when referring to an Rx FIFO or frame.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

45693bcb

igb: Refactor VFTA configuration · 832e821c

Alexander Duyck authored Jan 06, 2016

This patch starts the clean-up process on the VFTA configuration.
Specifically in this patch I attempt to address and simplify several items
while also updating the code to bring it more inline with what is already
in ixgbe.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

832e821c

igb: clean up code for setting MAC address · c3278587

Alexander Duyck authored Jan 06, 2016

Drop a bunch of hand written byte swapping code in favor of just doing the
byte swapping ourselves. The registers are little endian registers storing
a big endian value so if we read the MAC address array as little endian
then we will get the CPU registers into the proper layout.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

c3278587

igb/igbvf: don't give up · 9ce0e8d7

Mitch Williams authored Dec 11, 2015

The driver shouldn't just give up if it fails to get the hardware
mailbox lock. This can happen in a situation where the PF-VF
communication channel is heavily loaded and causes complete
communications failure between the PF and VF drivers.

Add a counter and a delay. The driver will now retry ten times, waiting
one millisecond between retries.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

9ce0e8d7

igb: Unpair the queues when changing the number of queues · 37a5d163

Shota Suzuki authored Dec 11, 2015

By the commit 72ddef05 ("igb: Fix oops caused by missing queue
pairing"), the IGB_FLAG_QUEUE_PAIRS flag can now be set when changing the
number of queues by "ethtool -L", but it is never cleared unless the igb
driver is reloaded.
This patch clears it if queue pairing becomes unnecessary as a result of
"ethtool -L".
Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

37a5d163

igb: Remove unnecessary flag setting in igb_set_flag_queue_pairs() · ceb27759

Shota Suzuki authored Dec 11, 2015

If VFs are enabled (max_vfs >= 1), both max_rss_queues and
adapter->rss_queues are set to 2 in the case of e1000_82576.
In this case, IGB_FLAG_QUEUE_PAIRS is always set in the default block as a
result of fall-through, thus setting it in the e1000_82576 block is not
necessary.
Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ceb27759

12 Feb, 2016 12 commits

Merge branch 'local-checksum-offload' · 667f0063

David S. Miller authored Feb 12, 2016

Edward Cree says:

====================
Local Checksum Offload

Re-tested VxLAN; everything else is unchanged from v4.

Changes from v4:
 * Rebased series to fix conflicts with vxlan/vxlan6 merge.

Changes from v3:
 * Fixed inverted checksum values introduced in v3.
 * Don't mangle zero checksums in GRE.
 * Clear skb->encapsulation in iptunnel_handle_offloads when not using
   CHECKSUM_PARTIAL, lest drivers incorrectly interpret that as a request
   for inner checksum offload.

Changes from v2:
 * Added support for IPv4 GRE.
 * Split out 'always set up for checksum offload' into its own patch.
 * Removed csum_help from iptunnel_handle_offloads.
 * Rewrote LCO callers to only fold once.
 * Simplified nocheck handling.

Changes from v1:
 * Enabled support in more encapsulation protocols.
   I think it now covers everything except GRE.
 * Wrote up some documentation covering TX checksum offload, LCO and RCO.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

667f0063

Documentation/networking: add checksum-offloads.txt to explain LCO · e8ae7b00
Edward Cree authored Feb 11, 2016
```
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
e8ae7b00

net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads · 6fa79666

Edward Cree authored Feb 11, 2016

All users now pass false, so we can remove it, and remove the code that
 was conditional upon it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fa79666

net: gre: Implement LCO for GRE over IPv4 · 53936107

Edward Cree authored Feb 11, 2016

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

53936107

fou: enable LCO in FOU and GUE · 06f62292

Edward Cree authored Feb 11, 2016

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06f62292

net: vxlan: enable local checksum offload · b5708501

Edward Cree authored Feb 11, 2016

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5708501

net: enable LCO for udp_tunnel_handle_offloads() users · 21e2e7f9

Edward Cree authored Feb 11, 2016

The only protocol affected at present is Geneve.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21e2e7f9

net: udp: always set up for CHECKSUM_PARTIAL offload · d75f1306

Edward Cree authored Feb 11, 2016

If the dst device doesn't support it, it'll get fixed up later anyway
 by validate_xmit_skb().  Also, this allows us to take advantage of LCO
 to avoid summing the payload multiple times.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d75f1306

net: local checksum offload for encapsulation · 179bc67f

Edward Cree authored Feb 11, 2016

The arithmetic properties of the ones-complement checksum mean that a
correctly checksummed inner packet, including its checksum, has a ones
complement sum depending only on whatever value was used to initialise
the checksum field before checksumming (in the case of TCP and UDP,
this is the ones complement sum of the pseudo header, complemented).
Consequently, if we are going to offload the inner checksum with
CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
packed data not covered by the inner checksum, and the initial value of
the inner checksum field.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

179bc67f

Merge branch 'tcp_dccp_ports' · e51271d4

David S. Miller authored Feb 12, 2016

Eric Dumazet says:

====================
tcp/dccp: better use of ephemeral ports

Big servers have bloated bind table, making very hard to succeed
ephemeral port allocations, without special containers/namespace tricks.

This patch series extends the strategy added in commit 07f4c900
("tcp/dccp: try to not exhaust ip_local_port_range in connect()").

Since ports used by connect() are much likely to be shared among them,
we give a hint to both bind() and connect() to keep the crowds separated
if possible.

Of course, if on a specific host an application needs to allocate ~30000
ports using bind(), it will still be able to do so. Same for ~30000 connect()
to a unique 2-tuple (dst addr, dst port)

New implemetation is also more friendly to softirqs and reschedules.

v2: rebase after TCP SO_REUSEPORT changes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e51271d4

tcp/dccp: better use of ephemeral ports in bind() · ea8add2b

Eric Dumazet authored Feb 11, 2016

Implement strategy used in __inet_hash_connect() in opposite way :

Try to find a candidate using odd ports, then fallback to even ports.

We no longer disable BH for whole traversal, but one bucket at a time.
We also use cond_resched() to yield cpu to other tasks if needed.

I removed one indentation level and tried to mirror the loop we have
in __inet_hash_connect() and variable names to ease code maintenance.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea8add2b

tcp/dccp: better use of ephemeral ports in connect() · 1580ab63

Eric Dumazet authored Feb 11, 2016

In commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), I added a very simple heuristic, so that we got better
chances to use even ports, and allow bind() users to have more available
slots.

It gave nice results, but with more than 200,000 TCP sessions on a typical
server, the ~30,000 ephemeral ports are still a rare resource.

I chose to go a step further, by looking at all even ports, and if none
was available, fallback to odd ports.

The companion patch does the same in bind(), but in opposite way.

I've seen exec times of up to 30ms on busy servers, so I no longer
disable BH for the whole traversal, but only for each hash bucket.
I also call cond_resched() to be gentle to other tasks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1580ab63

11 Feb, 2016 1 commit

Merge branch 'net-mitigate-kmem_free-slowpath' · 3134b9f0

David S. Miller authored Feb 11, 2016

Jesper Dangaard Brouer says:

====================
net: mitigating kmem_cache free slowpath

This patchset is the first real use-case for kmem_cache bulk _free_.
The use of bulk _alloc_ is NOT included in this patchset. The full use
have previously been posted here [1].

The bulk free side have the largest benefit for the network stack
use-case, because network stack is hitting the kmem_cache/SLUB
slowpath when freeing SKBs, due to the amount of outstanding SKBs.
This is solved by using the new API kmem_cache_free_bulk().

Introduce new API napi_consume_skb(), that hides/handles bulk freeing
for the caller.  The drivers simply need to use this call when freeing
SKBs in NAPI context, e.g. replacing their calles to dev_kfree_skb() /
dev_consume_skb_any().

Driver ixgbe is the first user of this new API.

[1] http://thread.gmane.org/gmane.linux.network/384302/focus=397373
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3134b9f0