Commits · 55e7fe5b9cd94e6accb128e6a1e5902e9018deef · Kirill Smelkov / linux

04 May, 2015 40 commits

e1000e: Do not allow CRC stripping to be disabled on 82579 w/ jumbo frames · 55e7fe5b

Alexander Duyck authored May 02, 2015

 The driver wasn't allowing jumbo frames to be
 enabled when CRC stripping was disabled, however it was allowing CRC
 stripping to be disabled while jumbo frames were enabled.  This fixes that by
 making it so that the NETIF_F_RXFCS flag cannot be set when jumbo frames are
 enabled on 82579 and newer parts.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

55e7fe5b

e1000e: Cleanup handling of VLAN_HLEN as a part of max frame size · 8084b86d

Alexander Duyck authored May 02, 2015

When the VLAN_HLEN was added to the calculation for the maximum frame size
there seems to have been a number of issues added to the driver.

The first issue is that in some cases the maximum frame size for a device
never really reached the actual maximum frame size as the VLAN header
length was not included the calculation for that value. As a result some
parts only supported a maximum frame size of either 1496 in the case of
parts that didn't support jumbo frames, and 8996 in the case of the parts
that do.

The second issue is the fact that there were several checks that weren't
updated so as a result setting an MTU of 1500 was treated as enabling jumbo
frames as the calculated value was 1522 instead of 1518. I have addressed
those by replacing ETH_FRAME_LEN with VLAN_ETH_FRAME_LEN where appropriate.

The final issue was the fact that lowering the MTU below 1500 would cause
the driver to allocate 2K buffers for the rings. This is an old issue that
was fixed several years ago in igb/ixgbe and I am addressing now by just
replacing == with a <= so that we always just round up to 1522 for anything
that isn't a jumbo frame.

Fixes: c751a3d5 ("e1000e: Correctly include VLAN_HLEN when changing interface MTU")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

8084b86d

e100: don't initialize int object to zero · ac7c1c5a

Jean Sacren authored May 02, 2015

'err' will be overwritten so no need to initialize it to zero.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ac7c1c5a

igb: simplify and clean up igb_enable_mas() · 8cfb879d

Todd Fujinaka authored May 02, 2015

igb_enable_mas() should only be called for the 82575 and has no clear
return so changing it to void. Also simplify the odd conditional
expression.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

8cfb879d

Merge branch 'via-rhine-rework' · 4256af62

David S. Miller authored May 04, 2015

Francois Romieu says:

====================
via-rhine rework

The series applies against davem-next as of
9dd3c797 ("drivers: net: xgene: fix kbuild
warnings").

Patches #1..#4 avoid holes in the receive ring.

Patch #5 is a small leftover cleanup for #1..#4.

Patches #6 and #7 are fairly simple barrier stuff.

Patch #8 closes some SMP transmit races - not that anyone really
complained about these but it's a bit hard to handwave that they
can be safely ignored. Some testing, especially SMP testing of
course, would be welcome.

. Changes since #2:
  - added dma_rmb barrier in vlan related patch 6.
  - s/wmb/dma_wmb/ in (*new*) patch 7 of 8.
  - added explicit SMP barriers in (*new*) patch 8 of 8.

. Changes since #1:
  - turned wmb() into dma_wmb() as suggested by davem and Alexander Duyck
    in patch 1 of 6.
  - forgot to reset rx_head_desc in rhine_reset_rbufs in patch 4 of 6.
  - removed rx_head_desc altogether in (*new*) patch 5 of 6
  - remoed some vlan receive uglyness in (*new*) patch 6 of 6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4256af62

via-rhine: close SMP transmit races. · 3a5a883a

françois romieu authored May 01, 2015

7ab87ff4 ("via-rhine: move work from
irq handler to softirq and beyond") forgot to explicitely control the
lifespan of the tx_dirty and tx_cur pointers.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a5a883a

via-rhine: dma_wmb transmit barrier. · e1efa872

françois romieu authored May 01, 2015

Follow the now usual transmit descriptor update path:
1. content change
2. dma_wmb
3. ownership change
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1efa872

via-rhine: add consistent memory barrier in vlan receive code. · 810f19bc

françois romieu authored May 01, 2015

The NAPI receive path depends on desc->rx_status but it does not
enforce any explicit receive barrier.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

810f19bc

via-rhine: kiss rx_head_desc goodbye. · 62ca1ba0

françois romieu authored May 01, 2015

The driver no longer produces holes in its receive ring so rx_head_desc
only duplicates cur_rx.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62ca1ba0

via-rhine: forbid holes in the receive descriptor ring. · 8709bb2c

françois romieu authored May 01, 2015

Rationales:
- throttle work under memory pressure
- lower receive descriptor recycling latency for the network adapter
- lower the maintenance burden of uncommon paths

The patch is twofold:
- it fails early if the receive ring can't be completely initialized
  at dev->open() time
- it drops packets on the floor in the napi receive handler so as to
  keep the received ring full
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8709bb2c

via-rhine: gotoize rhine_open error path. · 4d1fd9c1

françois romieu authored May 01, 2015

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d1fd9c1

via-rhine: allocate and map receive buffer in a single transaction · a21bb8ba

françois romieu authored May 01, 2015

It's used to initialize the receive ring but it will actually shine when
the receive poll code is reworked.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a21bb8ba

via-rhine: commit receive buffer address before descriptor status update. · e45af497
françois romieu authored May 01, 2015
```
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
e45af497

Merge branch 'flow_keys_digest' · 7c9a2eea

David S. Miller authored May 04, 2015

Tom Herbert says:

====================
net: Eliminate calls to flow_dissector and introduce flow_keys_digest

In this patch set we add skb_get_hash_perturb which gets the skbuff
hash for a packet and perturbs it using a provided key and jhash1.
This function is used in serveral qdiscs and eliminates many calls
to flow_dissector and jhash3 to get a perturbed hash for a packet.

To handle the sch_choke issue (passes flow_keys in skbuff cb) we
add flow_keys_digest which is a digest of a flow constructed
from a flow_keys structure.

This is the second version of these patches I posted a while ago,
and is prerequisite work to increasing the size of the flow_keys
structure and hashing over it (full IPv6 address, flow label, VLAN ID,
etc.).

Version 2:

- Add keyval parameter to __flow_hash_from_keys which allows caller to
  set the initval for jhash
- Perturb always does flow dissection and creates hash based on
  input perturb value which acts as the keyval to __flow_hash_from_keys
- Added a _flow_keys_digest_data which is used in make_flow_keys_digest.
  This fills out the digest by populating individual fields instead
  of copying the whole structure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7c9a2eea

sch_choke: Use flow_keys_digest · 2e99403d

Tom Herbert authored May 01, 2015

Call make_flow_keys_digest to get a digest from flow keys and
use that to pass skbuff cb and for comparing flows.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e99403d

net: Add flow_keys digest · 2f59e1eb

Tom Herbert authored May 01, 2015

Some users of flow keys (well just sch_choke now) need to pass
flow_keys in skbuff cb, and use them for exact comparisons of flows
so that skb->hash is not sufficient. In order to increase size of
the flow_keys structure, we introduce another structure for
the purpose of passing flow keys in skbuff cb. We limit this structure
to sixteen bytes, and we will technically treat this as a digest of
flow_keys struct hence its name flow_keys_digest. In the first
incaranation we just copy the flow_keys structure up to 16 bytes--
this is the same information previously passed in the cb. In the
future, we'll adapt this for larger flow_keys and could use something
like SHA-1 over the whole flow_keys to improve the quality of the
digest.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f59e1eb

sched: Call skb_get_hash_perturb in sch_sfq · ada1dba0

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ada1dba0

sched: Call skb_get_hash_perturb in sch_sfb · 63c0ad4d

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63c0ad4d

sched: Call skb_get_hash_perturb in sch_hhf · f969777a

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f969777a

sched: Call skb_get_hash_perturb in sch_fq_codel · 342db221

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

342db221

net: Add skb_get_hash_perturb · 50fb7992

Tom Herbert authored May 01, 2015

This calls flow_disect and __skb_get_hash to procure a hash for a
packet. Input includes a key to initialize jhash. This function
does not set skb->hash.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50fb7992

net: ipv4: route: Fix sending IGMP messages with link address · 6a211654

Andrew Lunn authored May 01, 2015

In setups with a global scope address on an interface, and a lesser
scope address on an interface sending IGMP reports, the reports can be
sent using the other interfaces global scope address rather than the
local interface address. RFC 2236 suggests:

     Ignore the Report if you cannot identify the source address of
     the packet as belonging to a subnet assigned to the interface on
     which the packet was received.

since such reports could be forged.

Look at the protocol when deciding if a RT_SCOPE_LINK address should
be used for the packet.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a211654

net: sched: run ingress qdisc without locks · 087c1a60

Alexei Starovoitov authored Apr 30, 2015

TC classifiers/actions were converted to RCU by John in the series:
http://thread.gmane.org/gmane.linux.network/329739/focus=329739
and many follow on patches.
This is the last patch from that series that finally drops
ingress spin_lock.

Single cpu ingress+u32 performance goes from 22.9 Mpps to 24.5 Mpps.

In two cpu case when both cores are receiving traffic on the same
device and go into the same ingress+u32 the performance jumps
from 4.5 + 4.5 Mpps to 23.5 + 23.5 Mpps
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

087c1a60

Merge branch 'tcp_sack_rttm' · a89f96c9

David S. Miller authored May 03, 2015

Kenneth Klette Jonassen says:

====================
tcp: SACK RTTM changes for congestion control

This patch series improves SACK RTT measurements for congestion control:
  o Picks the latest sequence SACKed for RTT, i.e. most accurate delay
    signal.
  o Calls the congestion control's pkts_acked hook with SACK RTTMs
    even when not sequentially ACKing new data.

V2: amend misleading comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a89f96c9

tcp: invoke pkts_acked hook on every ACK · 138998fd

Kenneth Klette Jonassen authored May 01, 2015

Invoking pkts_acked is currently conditioned on FLAG_ACKED:
receiving a cumulative ACK of new data, or ACK with SYN flag set.

Remove this condition so that CC may get RTT measurements from all SACKs.

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

138998fd

tcp: improve RTT from SACK for CC · 31231a8a

Kenneth Klette Jonassen authored May 01, 2015

tcp_sacktag_one() always picks the earliest sequence SACKed for RTT.
This might not make sense for congestion control in cases where:

  1. ACKs are lost, i.e. a SACK following a lost SACK covers both
     new and old segments at the receiver.
  2. The receiver disregards the RFC 5681 recommendation to immediately
     ACK out-of-order segments.

Give congestion control a RTT for the latest segment SACKed, which is the
most accurate RTT estimate, but preserve the conservative RTT for RTO.

Removes the call to skb_mstamp_get() in tcp_sacktag_one().

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31231a8a

tcp: move struct tcp_sacktag_state to tcp_ack() · 196da974

Kenneth Klette Jonassen authored May 01, 2015

Later patch passes two values set in tcp_sacktag_one() to
tcp_clean_rtx_queue(). Prepare passing them via struct tcp_sacktag_state.
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

196da974

Merge branch 'rhashtable-test' · 10308220

David S. Miller authored May 03, 2015

Thomas Graf says:

====================
rhashtable self-test improvements

This series improves the rhashtable self-test to:
  * Avoid allocation of test objects
  * Measure the time of test runs
  * Use the iterator to walk the table for consistency
  * Account for failed insertions due to memory pressure or
    utilization pressure
  * Ignore failed insertions when checking for consistency
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

10308220

rhashtable-test: Detect insertion failures · 67b7cbf4

Thomas Graf authored Apr 30, 2015

Account for failed inserts due to memory pressure or EBUSY and
ignore failed entries during the consistency check.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

67b7cbf4

rhashtable-test: Use walker to test bucket statistics · 246b23a7

Thomas Graf authored Apr 30, 2015

As resizes may continue to run in the background, use walker to
ensure we see all entries. Also print the encountered number
of rehashes queued up while traversing.

This may lead to warnings due to entries being seen multiple
times. We consider them non-fatal.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

246b23a7

rhashtable-test: Do not allocate individual test objects · fcc57020

Thomas Graf authored Apr 30, 2015

By far the most expensive part of the selftest was the allocation
of entries. Using a static array allows to measure the rhashtable
operations.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcc57020

rhashtable-test: Get rid of ptr in test_obj structure · c2c8a901

Thomas Graf authored Apr 30, 2015

This only blows up the size of the test structure for no gain
in test coverage. Reduces size of test_obj from 24 to 16 bytes.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2c8a901

rhashtable-test: Measure time to insert, remove & traverse entries · 1aa661f5

Thomas Graf authored Apr 30, 2015

Make test configurable by allowing to specify all relevant knobs
through module parameters.

Do several test runs and measure the average time it takes to
insert & remove all entries. Note, a deferred resize might still
continue to run in the background.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

1aa661f5

rhashtable-test: Remove unused TEST_NEXPANDS · f54e84b6

Thomas Graf authored Apr 30, 2015

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

f54e84b6

Merge branch 'eth_type_trans' · 7a852021

David S. Miller authored May 03, 2015

Alexander Duyck says:

====================
A few minor clean-ups to eth_type_trans

This series addresses a few minor issues I found in eth_type_trans that
that allow us to gain back something like 3 or more cycles per packet.

The first change is to drop the byte swap since it isn't necessary.  On x86
we could just check the first byte and compare that against the upper 8
bits of the Ethertype to determine if we are dealing with a size value or
not.

The second makes it so that the value we read in to test for multicast can
be used for the address comparison.  This allows us to avoid a second read
of the destination address.

The final change is to avoid some unneeded instructions in computing the
Ethernet header pointer.  When we start the call the Ethernet header is at
skb->data, so we just use that rather than computing mac_header, and then
adding that back to skb->head.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7a852021

etherdev: Use skb->data to retrieve Ethernet header instead of eth_hdr · 610986e7

Alexander Duyck authored Apr 30, 2015

Avoid recomputing the Ethernet header location and instead just use the
pointer provided by skb->data.  The problem with using eth_hdr is that the
compiler wasn't smart enough to realize that skb->head + skb->mac_header
was the same thing as skb->data before it added ETH_HLEN.  By just caching
it off before calling skb_pull_inline we can avoid a few unnecessary
instructions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

610986e7

etherdev: Process is_multicast_ether_addr at same size as other operations · d54385ce

Alexander Duyck authored Apr 30, 2015

This change makes it so that we process the address in
is_multicast_ether_addr at the same size as the other calls.  This allows
us to avoid duplicate reads when used with other calls such as
is_zero_ether_addr or eth_addr_copy.  In addition I have added a 64 bit
version of the function so in eth_type_trans we can process the destination
address as a 64 bit value throughout.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d54385ce

etherdev: Avoid unnecessary byte swap in check for Ethertype · 849b920e

Alexander Duyck authored Apr 30, 2015

This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to
512 so as a result we can actually ignore the lower 8b when comparing the
Ethertype to ETH_P_802_3_MIN.  This allows us to avoid a byte swap by simply
masking the value and comparing it to the byte swapped value for
ETH_P_802_3_MIN.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

849b920e

ipv6: Flow label state ranges · 82a584b7

Tom Herbert authored Apr 29, 2015

This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one.  Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

82a584b7

ipv6: Check RTF_LOCAL on rt->rt6i_flags instead of rt->dst.flags · 7035870d

Martin KaFai Lau authored May 03, 2015

In my earlier commit:
653437d0 ("ipv6: Stop /128 route from disappearing after pmtu update"),
there was a horrible typo.  Instead of checking RTF_LOCAL on
rt->rt6i_flags, it was checked on rt->dst.flags.  This patch fixes
it.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hajime Tazaki <tazaki@sfc.wide.ad.jp>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

7035870d