Commits · acbafeb1e9daa18d601e9d91b68925e863cc4f6e · Kirill Smelkov / linux

02 Sep, 2014 32 commits

be2net: add a few log messages · acbafeb1

Sathya Perla authored Sep 02, 2014

This patch adds the following log messages to help debugging
failure cases:
1) log FW version number: this is useful when driver initialization
fails and the FW version number cannot be queried via ethtool
2) per function resource limits for BEx chips: these values are
currently being printed only for Skyhawk and Lancer
3) PCI BAR mapping failure
4) function_mode/caps queried from FW: this helps catch any FW bugs
that could advertise wrong capabilities to the driver
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

acbafeb1

sock: deduplicate errqueue dequeue · 364a9e93

Willem de Bruijn authored Aug 31, 2014

sk->sk_error_queue is dequeued in four locations. All share the
exact same logic. Deduplicate.

Also collapse the two critical sections for dequeue (at the top of
the recv handler) and signal (at the bottom).

This moves signal generation for the next packet forward, which should
be harmless.

It also changes the behavior if the recv handler exits early with an
error. Previously, a signal for follow-up packets on the errqueue
would then not be scheduled. The new behavior, to always signal, is
arguably a bug fix.

For rxrpc, the change causes the same function to be called repeatedly
for each queued packet (because the recv handler == sk_error_report).
It is likely that all packets will fail for the same reason (e.g.,
memory exhaustion).

This code runs without sk_lock held, so it is not safe to trust that
sk->sk_err is immutable inbetween releasing q->lock and the subsequent
test. Introduce int err just to avoid this potential race.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

364a9e93

net-timestamp: expand documentation · 8fe2f761

Willem de Bruijn authored Aug 31, 2014

Expand Documentation/networking/timestamping.txt with new
interfaces and bytestream timestamping. Also minor
cleanup of the other text.

Import txtimestamp.c test of the new features.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fe2f761

Merge branch 'csums-next' · c5a65680

David S. Miller authored Sep 01, 2014

Tom Herbert says:

====================
net: Checksum offload changes - Part VI

I am working on overhauling RX checksum offload. Goals of this effort
are:

- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simplify code

What is in this seventh patch set:

- Add skb->csum. This allows a device or GRO to indicate that an
  invalid checksum was detected.
- Checksum unncessary to checksum complete conversions.

With these changes, I believe that the third goal of the overhaul is
now mostly achieved. In the case of no encapsulation or one layer of
encapsulation, there should only be at most one skb_checksum over
each packet (between GRO and normal path). In the case of two layers
of encapsulation, it is still possible with the right combination of
non-zero and zero UDP checksums to have >1 skb_checksum. For instance:
IP>GRE(with csum)>IP>UDP(zero csum)>VXLAN>IP>UDP(non-zero csum),
would likely necessiate an skb_checksum in GRO and normal path.
This doesn't seem like a common scenario at all so I'm inclined to
not address this now, if multiple layers of encapsulation becomes
popular we can reassess.

Note that checksum conversion shows a nice improvement for RX VXLAN when
outer UDP checksum is enabled (12.65% CPU compared to 20.94%). This
is not only from the fact that we don't need checksum calculation on
the host, but also allows GRO for VXLAN in this case. Checksum
conversion does not help send side (which still needs to perform
a checksum on host). For that we will implement remote checksum offload
in a later patch
(http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00).

Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c5a65680

l2tp: Enable checksum unnecessary conversions for l2tp/UDP sockets · 72297c59
Tom Herbert authored Aug 31, 2014
```
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
72297c59
vxlan: Enable checksum unnecessary conversions for vxlan/UDP sockets · c60c308c
Tom Herbert authored Aug 31, 2014
```
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
c60c308c

gre: Add support for checksum unnecessary conversions · 884d338c

Tom Herbert authored Aug 31, 2014

Call skb_checksum_try_convert and skb_gro_checksum_try_convert
after checksum is found present and validated in the GRE header
for normal and GRO paths respectively.

In GRO path, call skb_gro_checksum_try_convert
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

884d338c

udp: Add support for doing checksum unnecessary conversion · 2abb7cdc

Tom Herbert authored Aug 31, 2014

Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE
conversion in UDP tunneling path.

In the normal UDP path, we call skb_checksum_try_convert after locating
the UDP socket. The check is that checksum conversion is enabled for
the socket (new flag in UDP socket) and that checksum field is
non-zero.

In the UDP GRO path, we call skb_gro_checksum_try_convert after
checksum is validated and checksum field is non-zero. Since this is
already in GRO we assume that checksum conversion is always wanted.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2abb7cdc

net: Infrastructure for checksum unnecessary conversions · d96535a1

Tom Herbert authored Aug 31, 2014

For normal path, added skb_checksum_try_convert which is called
to attempt to convert CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE. The
primary condition to allow this is that ip_summed is CHECKSUM_NONE
and csum_valid is true, which will be the state after consuming
a CHECKSUM_UNNECESSARY.

For GRO path, added skb_gro_checksum_try_convert which is the GRO
analogue of skb_checksum_try_convert. The primary condition to allow
this is that NAPI_GRO_CB(skb)->csum_cnt == 0 and
NAPI_GRO_CB(skb)->csum_valid is set. This implies that we have consumed
all available CHECKSUM_UNNECESSARY checksums in the GRO path.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d96535a1

net: Support for csum_bad in skbuff · 5a212329

Tom Herbert authored Aug 31, 2014

This flag indicates that an invalid checksum was detected in the
packet. __skb_mark_checksum_bad helper function was added to set this.

Checksums can be marked bad from a driver or the GRO path (the latter
is implemented in this patch). csum_bad is checked in
__skb_checksum_validate_complete (i.e. calling that when ip_summed ==
CHECKSUM_NONE).

csum_bad works in conjunction with ip_summed value. In the case that
ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the
first (or next) checksum encountered in the packet is bad. When
ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last
one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY,
csum_level == 1, and csum_bad is set-- then the third checksum in the
packet is bad. In the normal path, the packet will be dropped when
processing the protocol layer of the bad checksum:
__skb_decr_checksum_unnecessary called twice for the good checksums
changing ip_summed to CHECKSUM_NONE so that
__skb_checksum_validate_complete is called to validate the third
checksum and that will fail since csum_bad is set.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a212329

r8152: rename rx_buf_sz · 52aec126

hayeswang authored Sep 02, 2014

The variable "rx_buf_sz" is used by both tx and rx buffers. Replace
it with "agg_buf_sz".
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52aec126

net: phy: mdio-bcm-unimac: NULL-terminate unimac_mdio_ids · 4559154a

Florian Fainelli authored Aug 29, 2014

drivers/net/phy/mdio-bcm-unimac.c:195:37-38: unimac_mdio_ids is not NULL
terminated at line 195

Make sure of_device_id tables are NULL terminated
Generated by: scripts/coccinelle/misc/of_table.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4559154a

net: dsa: make dsa_pack_type static · 61b7363f

Florian Fainelli authored Aug 29, 2014

net/dsa/dsa.c:624:20: sparse: symbol 'dsa_pack_type' was not declared.
Should it be static?

Fixes: 3e8a72d1 ("net: dsa: reduce number of protocol hooks")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61b7363f

bonding: add slave_changelink support and use it for queue_id · 0f23124a

Nikolay Aleksandrov authored Aug 27, 2014

This patch adds support for slave_changelink to the bonding and uses it
to give the ability to change the queue_id of the enslaved devices via
netlink. It sets slave_maxtype and uses bond_changelink as a prototype for
bond_slave_changelink.
Example/test command after the iproute2 patch:
 ip link set eth0 type bond_slave queue_id 10

CC: David S. Miller <davem@davemloft.net>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f23124a

tcp: whitespace fixes · 688d1945

stephen hemminger authored Aug 29, 2014

Fix places where there is space before tab, long lines, and
awkward if(){, double spacing etc. Add blank line after declaration/initialization.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

688d1945

net: systemport: tell RXCHK if we are using Broadcom tags · d09d3038

Florian Fainelli authored Aug 28, 2014

When Broadcom tags are enabled, e.g: when interfaced to an Ethernet
switch, make sure that we tell the RXCHK engine that it should be
expecting a 4-bytes Broadcom tag after the Ethernet MAC Source Address.

Use netdev_uses_dsa() to check for that condition since that will tell
us if a switch is attached to our network interface.

Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d09d3038

pktgen: add flag NO_TIMESTAMP to disable timestamping · afb84b62

Jesper Dangaard Brouer authored Aug 28, 2014

Then testing the TX limits of the stack, then it is useful to
be-able to disable the do_gettimeofday() timetamping on every packet.

This implements a pktgen flag NO_TIMESTAMP which will disable this
call to do_gettimeofday().

The performance change on (my system E5-2695) with skb_clone=0, goes
from TX 2,423,751 pps to 2,567,165 pps with flag NO_TIMESTAMP. Thus,
the cost of do_gettimeofday() or saving is approx 23 nanosec.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

afb84b62

bnx2x: fix tunneled GSO over IPv6 · 05f8461b

Dmitry Kravkov authored Aug 28, 2014

Set correct bit for packed description.

Introduced in e42780b6
    bnx2x: Utilize FW 7.10.51
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05f8461b

bnx2x: prevent incorrect byte-swap in BE · 55ef5c89

Dmitry Kravkov authored Aug 28, 2014

Fixes incorrectly defined struct in FW HSI for BE platform.
Affects tunneling, tx-switching and anti-spoofing.

Introduced in e42780b6
    bnx2x: Utilize FW 7.10.51
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55ef5c89

tipc: add name distributor resiliency queue · a5325ae5

Erik Hugne authored Aug 28, 2014

TIPC name table updates are distributed asynchronously in a cluster,
entailing a risk of certain race conditions. E.g., if two nodes
simultaneously issue conflicting (overlapping) publications, this may
not be detected until both publications have reached a third node, in
which case one of the publications will be silently dropped on that
node. Hence, we end up with an inconsistent name table.

In most cases this conflict is just a temporary race, e.g., one
node is issuing a publication under the assumption that a previous,
conflicting, publication has already been withdrawn by the other node.
However, because of the (rtt related) distributed update delay, this
may not yet hold true on all nodes. The symptom of this failure is a
syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".

In this commit we add a resiliency queue at the receiving end of
the name table distributor. When insertion of an arriving publication
fails, we retain it in this queue for a short amount of time, assuming
that another update will arrive very soon and clear the conflict. If so
happens, we insert the publication, otherwise we drop it.

The (configurable) retention value defaults to 2000 ms. Knowing from
experience that the situation described above is extremely rare, there
is no risk that the queue will accumulate any large number of items.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5325ae5

tipc: refactor name table updates out of named packet receive routine · f4ad8a4b

Erik Hugne authored Aug 28, 2014

We need to perform the same actions when processing deferred name
table updates, so this functionality is moved to a separate
function.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4ad8a4b

r8152: reduce the number of Tx · 1764bcd9

hayeswang authored Aug 28, 2014

Because the Tx has the features of stopping queue and aggregation,
We don't need many tx buffers. Change the tx number from 10 to 4
to reduce the usage of the memory. This could save 16K * 6 bytes
memory.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1764bcd9

Merge branch 'xmit_list' · 53fda7f7

David S. Miller authored Sep 01, 2014

David Miller says:

====================
net: Make dev_hard_start_xmit() work fundamentally on lists

After this patch set, dev_hard_start_xmit() will work fundemantally on
any and all SKB lists.

This opens the path for a clean implementation of pulling multiple
packets out during qdisc_restart(), and then passing that blob in one
shot to dev_hard_start_xmit().

There were two main architectural blockers to this:

1) The GSO handling, we kept the original GSO head SKB around simply
   because dev_hard_start_xmit() had no way to communicate to the
   caller how far into the segmented list it was able to go.  Now it
   can, so the head GSO can be liberated immediately.

   All of the special GSO head SKB destructor et al. handling goes
   away too.

2) Validate of VLAN, CSUM, and segmentation characteristics was being
   performed inside of dev_hard_start_xmit().  If want to truly batch,
   we have to let the higher levels to this.  In particular, this is
   now dequeue_skb()'s job.

And with those two issues out of the way, it should now be trivial to
build experiments on top of this patch set, all of the framework
should be there now.  You could do something as simple as:

	skb = q->dequeue(q);
	if (skb)
		skb = validate_xmit_skb(skb, qdisc_dev(q));
	if (skb) {
		struct sk_buff *new, *head = skb;
		int limit = 5;

		do {
			new = q->dequeue(q);
			if (new)
				new = validate_xmit_skb(new, qdisc_dev(q));
			if (new) {
				skb->next = new;
				skb = new;
			}
		} while (new && --limit);
		skb = head;
	}

inside of the else branch of dequeue_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>

53fda7f7

net: xmit_list() becomes dev_hard_start_xmit(). · 8dcda22a

David S. Miller authored Sep 01, 2014

Now fundamentally we can process lists of SKBs as cheaply
as single packets.
Signed-off-by: David S. Miller <davem@davemloft.net>

8dcda22a

net: Don't keep around original SKB when we software segment GSO frames. · ce93718f

David S. Miller authored Aug 30, 2014

Just maintain the list properly by returning the head of the remaining
SKB list from dev_hard_start_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>

ce93718f

net: Validate xmit SKBs right when we pull them out of the qdisc. · 50cbe9ab
David S. Miller authored Aug 30, 2014
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
50cbe9ab

net: Separate out SKB validation logic from transmit path. · eae3f88e

David S. Miller authored Aug 30, 2014

dev_hard_start_xmit() does two things, it first validates and
canonicalizes the SKB, then it actually sends it.

Make a set of helper functions for doing the first part.
Signed-off-by: David S. Miller <davem@davemloft.net>

eae3f88e

net: Have xmit_list() signal more==true when appropriate. · 95f6b3dd
David S. Miller authored Aug 29, 2014
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
95f6b3dd
net: Pass a "more" indication down into netdev_start_xmit() code paths. · fa2dbdc2
David S. Miller authored Aug 29, 2014
```
For now it will always be false.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
fa2dbdc2

net: Move main gso loop out of dev_hard_start_xmit() into helper. · 7f2e870f

David S. Miller authored Aug 29, 2014

There is a slight policy change happening here as well.

The previous code would drop the entire rest of the GSO skb if any of
them got, for example, a congestion notification.

That makes no sense, anything NET_XMIT_MASK and below is something
like congestion or policing.  And in the congestion case it doesn't
even mean the packet was actually dropped.

Just continue until dev_xmit_complete() evaluates to false.
Signed-off-by: David S. Miller <davem@davemloft.net>

7f2e870f

net: Create xmit_one() helper for dev_hard_start_xmit() · 2ea25513

David S. Miller authored Aug 29, 2014

Hopefully making the code a bit easier to read and digest.
Signed-off-by: David S. Miller <davem@davemloft.net>

2ea25513

net: Do txq_trans_update() in netdev_start_xmit() · 10b3ad8c

David S. Miller authored Aug 29, 2014

That way we don't have to audit every call site to make sure it is
doing this properly.
Signed-off-by: David S. Miller <davem@davemloft.net>

10b3ad8c

30 Aug, 2014 8 commits

net: stmmac: fix warning from Sparse for socfpga · dace1b54

Ley Foon Tan authored Aug 28, 2014

Warning:
drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:122:41:
sparse: cast removes address space of expression
drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:122:38:
sparse: incorrect type in assignment (different address spaces)
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dace1b54

Merge branch 'csums-next' · 030824e0

David S. Miller authored Aug 29, 2014

Tom Herbert says:

====================
net: Checksum offload changes - Part VI

I am working on overhauling RX checksum offload. Goals of this effort
are:

- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simplify code

What is in this sixth patch set:

- Clarify the specific requirements of devices returning
  CHECKSUM_UNNECESSARY (comments in skbuff.h).
- Add csum_level field to skbuff. This is used to express how
  many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1).
- Change __skb_checksum_validate_needed to "consume" each checksum
  as indicated by csum_level as layers of the the packet are parsed.
- Remove skb_pop_rcv_encapsulation, no longer needed in the new
  csum_level model.
- Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY
  and to report new verfied checksums for use in normal path fallback.
- Add proper support to SCTP to accept CHECKSUM_UNNECESSARY to validate
  header CRC.
- Modify drivers to set skb->csum_level instead of setting
  skb->encapsulation to indicate validation of an encapsulated
  checksum on receive.

v2:

Allocate a new 16 bits for flags in skbuff.

Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

030824e0

qlcnic: Set skb->csum_level for encapsulated checksum · 71d7a277

Tom Herbert authored Aug 27, 2014

Set skb->csum_level instead of skb->encapsulation when indicating
CHECKSUM_UNNECESSARY for an encapsulated checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71d7a277

mlx4: Set skb->csum_level for encapsulated checksum · 9ca8600e

Tom Herbert authored Aug 27, 2014

Set skb->csum_level instead of skb->encapsulation when indicating
CHECKSUM_UNNECESSARY for an encapsulated checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ca8600e

i40evf: Set skb->csum_level for encapsulated checksum · 407fa085

Tom Herbert authored Aug 27, 2014

Set skb->csum_level instead of skb->encapsulation when indicating
CHECKSUM_UNNECESSARY for an encapsulated checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

407fa085

i40e: Set skb->csum_level for encapsulated checksum · fa4ba69b

Tom Herbert authored Aug 27, 2014

Set skb->csum_level instead of skb->encapsulation when indicating
CHECKSUM_UNNECESSARY for an encapsulated checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa4ba69b

benet: Set skb->csum_level for encapsulated checksum · b6c0e89d

Tom Herbert authored Aug 27, 2014

Set skb->csum_level instead of skb->encapsulation when indicating
CHECKSUM_UNNECESSARY for an encapsulated checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6c0e89d

sctp: Change sctp to implement csum_levels · 202863fe

Tom Herbert authored Aug 27, 2014

CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to
appropriate account for this by decrementing csum_level. This is
done by calling __skb_dec_checksum_unnecessary.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

202863fe