Commits · 5ea8bbfc49291b7e23161fe4de0bf3e4a4e34b18 · nexedi / linux

12 Mar, 2014 27 commits

mlx4: Implement IP based gids support for RoCE/SRIOV · 5ea8bbfc

Jack Morgenstein authored Mar 12, 2014

Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,

This is achieved as follows:
Outgoing MADs:
    1. Source MAC: obtained from the CQ completion structure
       (struct ib_wc, smac field).
    2. Destination MAC: obtained from the tunnel header
    3. vlan_id: obtained from the tunnel header.
Incoming MADs
    1. The source (i.e., remote) MAC and vlan_id are passed in
       the tunnel header to the proxy QP1.

VST mode support:
     For outgoing MADs,  the vlan_id obtained from the header is
        discarded, and the vlan_id specified by the Hypervisor is used
        instead.
     For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
        "invalid" vlan (0xffff)  is substituted when forwarding to the slave.
Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ea8bbfc

mlx4: Add ref counting to port MAC table for RoCE · 2f5bb473

Jack Morgenstein authored Mar 12, 2014

The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.

To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.

This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.

Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.

The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified:  INIT-to-RTR, and SQD-to-SQD.

Similarly for the alternate path information.

Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.

For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
   MAC information to the operational information, and reset
   the candidate MAC info.
2. If there was a previous MAC, unregister it.  Then move
   the MAC information from candidate to operational, and
   reset the candidate info (as in 1. above).

The MAC address failure logic is the same for all cases:
 - Unregister the candidate MAC, and reset the candidate MAC info.

For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f5bb473

mlx4: In RoCE allow guests to have multiple GIDS · b6ffaeff

Jack Morgenstein authored Mar 12, 2014

The GIDs are statically distributed, as follows:
PF: gets 16 GIDs
VFs:  Remaining GIDS are divided evenly between VFs activated by the driver.
      If the division is not even, lower-numbered VFs get an extra GID.

For an IB interface, the number of gids per guest remains as before: one gid per guest.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6ffaeff

mlx4_core: For RoCE, allow slaves to set the GID entry at that slave's index · 9cd59352

Jack Morgenstein authored Mar 12, 2014

For IB transport, the host determines the slave GIDs. For ETH (RoCE),
however, the slave's GID is determined by the IP address that the slave
itself assigns to the ETH device used by RoCE.

In this case, the slave must be able to write its GIDs to the HCA gid table
(at the GID indices that slave "owns").

This commit adds processing for the SET_PORT_GID_TABLE opcode modifier
for the SET_PORT command wrapper (so that slaves may modify their GIDS
for RoCE).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cd59352

mlx4: Adjust QP1 multiplexing for RoCE/SRIOV · 6ee51a4e

Jack Morgenstein authored Mar 12, 2014

This requires the following modifications:
1. Fix build_mlx4_header to properly fill in the ETH fields
2. Adjust mux and demux QP1 flow to support RoCE.

This commit still assumes only one GID per slave for RoCE.
The commit enabling multiple GIDs is a subsequent commit, and
is done separately because of its complexity.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ee51a4e

Merge branch 'tipc' · 36f6fdb7

David S. Miller authored Mar 12, 2014

Jon Maloy says:

====================
tipc: simplifications in socket and port layer

After the removal of the tipc native API the relation between
tipc_port and its API types is strictly one-to-one, i.e, the latter
can now only be a socket API. This change opens up for
simplifications both in the code, data and locking structure.

We start with this series, where we ensure that port and socket
structures are co-allocated. Note that the first commit in the
series is unrelated to the above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

36f6fdb7

tipc: eliminate redundant lookups in registry · 5c311421

Jon Paul Maloy authored Mar 12, 2014

As an artefact from the native interface, the message sending functions
in the port takes a port ref as first parameter, and then looks up in
the registry to find the corresponding port pointer. This despite the
fact that the only currently existing caller, tipc_sock, already knows
this pointer.

We change the signature of these functions to take a struct tipc_port*
argument, and remove the redundant lookups.

We also remove an unmotivated extra lookup in the function
socket.c:auto_connect(), and, as the lookup functions tipc_port_deref()
and ref_deref() now become unused, we remove these two functions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c311421

tipc: align usage of variable names and macros in socket · 58ed9442

Jon Paul Maloy authored Mar 12, 2014

The practice of naming variables in TIPC is inconistent, sometimes
even within the same file.

In this commit we align variable names and declarations within
socket.c, and function and macro names within socket.h. We also
reduce the number of conversion macros to two, in order to make
usage less obsure.

These changes are purely cosmetic.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58ed9442

tipc: eliminate redundant locking · 3b4f302d

Jon Paul Maloy authored Mar 12, 2014

The three functions tipc_portimportance(), tipc_portunreliable() and
tipc_portunreturnable() and their corresponding tipc_set* functions,
are all grabbing port_lock when accessing the targeted port. This is
unnecessary in the current code, since these calls only are made from
within socket downcalls, already protected by sock_lock.

We remove the redundant locking. Also, since the functions now become
trivial one-liners, we move them to port.h and make them inline.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b4f302d

tipc: eliminate upcall function pointers between port and socket · 24be34b5

Jon Paul Maloy authored Mar 12, 2014

Due to the original one-to-many relation between port and user API
layers, upcalls to the API have been performed via function pointers,
installed in struct tipc_port at creation. Since this relation now
always is one-to-one, we can instead use ordinary function calls.

We remove the function pointers 'dispatcher' and ´wakeup' from
struct tipc_port, and replace them with calls to the renamed
functions tipc_sk_rcv() and tipc_sk_wakeup().

At the same time we change the name and signature of the functions
tipc_createport() and tipc_deleteport() to reflect their new role
as mere initialization/destruction functions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24be34b5

tipc: aggregate port structure into socket structure · 8826cde6

Jon Paul Maloy authored Mar 12, 2014

After the removal of the tipc native API the relation between
a tipc_port and its API types is strictly one-to-one, i.e, the
latter can now only be a socket API. There is therefore no need
to allocate struct tipc_port and struct sock independently.

In this commit, we aggregate struct tipc_port into struct tipc_sock,
hence saving both CPU cycles and structure complexity.

There are no functional changes in this commit, except for the
elimination of the separate allocation/freeing of tipc_port.
All other changes are just adaptatons to the new data structure.

This commit also opens up for further code simplifications and
code volume reduction, something we will do in later commits.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8826cde6

tipc: remove redundant 'peer_name' field in struct tipc_sock · f9fef18c

Jon Paul Maloy authored Mar 12, 2014

The field 'peer_name' in struct tipc_sock is redundant, since
this information already is available from tipc_port, to which
tipc_sock has a reference.

We remove the field, and ensure that peer node and peer port
info instead is fetched via the functions that already exist
for this purpose.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9fef18c

tipc: replace reference table rwlock with spinlock · 978813ee

Jon Paul Maloy authored Mar 12, 2014

The lock for protecting the reference table is declared as an
RWLOCK, although it is only used in write mode, never in read
mode.

We redefine it to become a spinlock.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

978813ee

flowcache: Fix resource leaks on namespace exit. · 4a93f509

Steffen Klassert authored Mar 12, 2014

We leak an active timer, the hotcpu notifier and all allocated
resources when we exit a namespace. Fix this by introducing a
flow_cache_fini() function where we release the resources before
we exit.

Fixes: ca925cf1 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a93f509

lg-vl600: Convert uses of __constant_<foo> to <foo> · 1f36fc74

Joe Perches authored Mar 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f36fc74

xilinx: Convert uses of __constant_<foo> to <foo> · ceffc4ac

Joe Perches authored Mar 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ceffc4ac

brocade: Convert uses of __constant_<foo> to <foo> · b779d0af

Joe Perches authored Mar 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b779d0af

tipc: Convert uses of __constant_<foo> to <foo> · 184593c7

Joe Perches authored Mar 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

184593c7

ieee802154: Convert uses of __constant_<foo> to <foo> · ec633eb5

Joe Perches authored Mar 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec633eb5

net: Convert uses of __constant_<foo> to <foo> · 2b8837ae

Joe Perches authored Mar 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b8837ae

8021q: Convert uses of __constant_<foo> to <foo> · f0e78826

Joe Perches authored Mar 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0e78826

gianfar: Fix multi-queue support checks @probe() · b338ce27

Claudiu Manoil authored Mar 11, 2014

priv is not instantiated at gfar_of_init() time, when
parsing the DT for info on supported HW queues.  Before
the netdev can be allocated, the number of supported
queues must be known.  Because the number of supported
queues depends on device type, move the compatibility
checks before netdev allocation.  Local vars are used
to hold the operation mode info before netdev allocation.
This fixes the null accesses for priv->.., in gfar_of_init.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b338ce27

r8152: support dumping the hw counters · 4f1d4d54

hayeswang authored Mar 11, 2014

Add dumping the tally counter by ethtool.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f1d4d54

ieee802154: at86rf230: add support for rf233 chip · 48d5dbaf

Thomas Stilwell authored Mar 10, 2014

The rf233 and rf231 are sufficiently similar that we can treat
rf233 like rf231.

rf233 is missing some features that rf231 has, but we don't currently
make use of them so there's nothing to handle differently yet.

Should we add support in the future for rf231 *_NOCLK or SLEEP states,
or PAD_IO drive strength, exceptions will need to be made for rf233.
Signed-off-by: Thomas Stilwell <stilwellt@openlabs.co>
Signed-off-by: David S. Miller <davem@davemloft.net>

48d5dbaf

Merge branch 'pkt_sched_cond_resched' · 62cf4be9

David S. Miller authored Mar 11, 2014

Eric Dumazet says:

====================
pkt_sched: allow scheduling points

We have seen delays of more than 50ms in class or qdisc dumps, in case
device is under high TX stress, even with the prior 4KB per skb limit.

With the new 16KB limit, this could translate to 200ms delays.

Add cond_resched() to give a chance to higher prio tasks to get cpu.

But before doing so, we need to remove the rcu locking from tc_dump_qdisc()
as David spotted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

62cf4be9

pkt_sched: add cond_resched() to class and qdisc dump · fba373d2

Eric Dumazet authored Mar 10, 2014

We have seen delays of more than 50ms in class or qdisc dumps, in case
device is under high TX stress, even with the prior 4KB per skb limit.

Add cond_resched() to give a chance to higher prio tasks to get cpu.

Signed-off-by; Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fba373d2

pkt_sched: do not use rcu in tc_dump_qdisc() · 15dc36eb

Eric Dumazet authored Mar 10, 2014

Like all rtnetlink dump operations, we hold RTNL in tc_dump_qdisc(),
so we do not need to use rcu protection to protect list of netdevices.

This will allow preemption to occur, thus reducing latencies.
Following patch adds explicit cond_resched() calls.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15dc36eb

11 Mar, 2014 6 commits

bonding: force cast of IP address in options · a19a7ec8

stephen hemminger authored Mar 10, 2014

The option code is taking IP address and putting it into a generic
container. Force cast to silence sparse warnings.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a19a7ec8

netdev: set __percpu attribute on netdev_alloc_pcpu_stats · 693350c2

stephen hemminger authored Mar 10, 2014

This patch fixes sparse warnings in vlan driver.
It propagates the sparse __percpu attribute from alloc_percpu
into netdev_alloc_pcpu_stats. I expect it may trigger additional
sparse warnings from other drivers that are missing the __percpu
attribute.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

693350c2

ipv6: ip6_forward: perform skb->pkt_type check at the beginning · 090f1166

Li RongQing authored Mar 11, 2014

Packets which have L2 address different from ours should be
already filtered before entering into ip6_forward().

Perform that check at the beginning to avoid processing such packets.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

090f1166

r8152: add skb_cow_head · fcb308d5

hayeswang authored Mar 11, 2014

Call skb_cow_head() before editing the tx packet header. The header
would be reallocated if it is shared.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcb308d5

net: eth: cpsw: Use net_device_stats from struct net_device · 8dc43ddc

Tobias Klauser authored Mar 10, 2014

Instead of using an own copy of struct net_device_stats in struct
cpsw_priv, use stats from struct net_device. Also remove the thus
unnecessary .ndo_get_stats function, as it just returns dev->stats,
which is the default.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8dc43ddc

flowcache: restore a single flow_cache kmem_cache · d32d9bb8

Eric Dumazet authored Mar 10, 2014

It is not legal to create multiple kmem_cache having the same name.

flowcache can use a single kmem_cache, no need for a per netns
one.

Fixes: ca925cf1 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d32d9bb8

10 Mar, 2014 7 commits

net: add a pre-check of net_ns in sk_change_net() · 5812521b

Gu Zheng authored Mar 10, 2014

We do not need to switch the net_ns if the target net_ns the same
as the current one, so here we add a pre-check of net_ns to avoid
this as David suggested.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5812521b

tcp: timestamp SYN+DATA messages · 431a9124

Eric Dumazet authored Mar 09, 2014

All skb in socket write queue should be properly timestamped.

In case of FastOpen, we special case the SYN+DATA 'message' as we
queue in socket wrote queue the two fallback skbs:

1) SYN message by itself.
2) DATA segment by itself.

We should make sure these skbs have proper timestamps.

Add a WARN_ON_ONCE() to eventually catch future violations.

Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

431a9124

hyperv: Change the receive buffer size for legacy hosts · 99d3016d

Haiyang Zhang authored Mar 09, 2014

Due to a bug in the Hyper-V host verion 2008R2, we need to use a slightly smaller
receive buffer size, otherwise the buffer will not be accepted by the legacy hosts.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99d3016d

6lowpan: reassembly: fix access of ctl table entry · 3772ab1d

Alexander Aring authored Mar 09, 2014

Correct offset is 3 of the 6lowpanfrag_max_datagram_size value in proc
entry ctl table and not 2.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3772ab1d

Merge branch 'hyperv-next' · e3ca6494

David S. Miller authored Mar 10, 2014

K. Y. Srinivasan says:

====================
Drivers: net: hyperv: Enable various offloads

This patch set enables both checksum as well as segmentation offload.
As part of this effort I have enabled scatter gather I/O a well.

In version 2 of these patches, I addressed comments from David Miller and
Dan Carpenter.

In this version I have addressed the latest comments from David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e3ca6494

Drivers: net: hyperv: Enable large send offload · 77bf5487

KY Srinivasan authored Mar 08, 2014

Enable segmentation offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77bf5487

Drivers: net: hyperv: Enable send side checksum offload · 08cd04bf

KY Srinivasan authored Mar 08, 2014

Enable send side checksum offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

08cd04bf