Commits · be6e6707f6eec2048d9be608bc0ceecde5bd4cef · Kirill Smelkov / linux

15 Jun, 2016 30 commits

rxrpc: Rework peer object handling to use hash table and RCU · be6e6707

David Howells authored Apr 04, 2016

Rework peer object handling to use a hash table instead of a flat list and
to use RCU. Peer objects are no longer destroyed by passing them to a
workqueue to process, but rather are just passed to the RCU garbage
collector as kfree'able objects.

The hash function uses the local endpoint plus all the components of the
remote address, except for the RxRPC service ID. Peers thus represent a
UDP port on the remote machine as contacted by a UDP port on this machine.

The RCU read lock is used to handle non-creating lookups so that they can
be called from bottom half context in the sk_error_report handler without
having to lock the hash table against modification.
rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in
the future, this will be passed to a work item for error distribution in
the error_report path and this function will cease being used in the
data_ready path.

Creating lookups are done under spinlock rather than mutex as they might be
set up due to an external stimulus if the local endpoint is a server.

Captured network error messages (ICMP) are handled with respect to this
struct and MTU size and RTT are cached here.
Signed-off-by: David Howells <dhowells@redhat.com>

be6e6707

act_police: rename tcf_act_police_locate() to tcf_act_police_init() · d9fa17ef

WANG Cong authored Jun 13, 2016

This function is just ->init(), rename it to make it obvious.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9fa17ef

net_sched: remove internal use of TC_POLICE_* · 95df1b16

WANG Cong authored Jun 13, 2016

These should be gone when we removed CONFIG_NET_CLS_POLICE.
We can not totally remove them since they are exposed
to userspace.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95df1b16

Merge branch 'rds-mprds-foundations' · 161cd45f

David S. Miller authored Jun 14, 2016

Sowmini Varadhan says:

====================
RDS: multiple connection paths for scaling

Today RDS-over-TCP is implemented by demux-ing multiple PF_RDS sockets
between any 2 endpoints (where endpoint == [IP address, port]) over a
single TCP socket between the 2 IP addresses involved. This has the
limitation that it ends up funneling multiple RDS flows over a single
TCP flow, thus the rds/tcp connection is
   (a) upper-bounded to the single-flow bandwidth,
   (b) suffers from head-of-line blocking for the RDS sockets.

Better throughput (for a fixed small packet size, MTU) can be achieved
by having multiple TCP/IP flows per rds/tcp connection, i.e., multipathed
RDS (mprds).  Each such TCP/IP flow constitutes a path for the rds/tcp
connection. RDS sockets will be attached to a path based on some hash
(e.g., of local address and RDS port number) and packets for that RDS
socket will be sent over the attached path using TCP to segment/reassemble
RDS datagrams on that path.

The table below, generated using a prototype that implements mprds,
shows that this is significant for scaling to 40G.  Packet sizes
used were: 8K byte req, 256 byte resp. MTU: 1500.  The parameters for
RDS-concurrency used below are described in the rds-stress(1) man page-
the number listed is proportional to the number of threads at which max
throughput was attained.

  -------------------------------------------------------------------
     RDS-concurrency   Num of       tx+rx K/s (iops)       throughput
     (-t N -d N)       TCP paths
  -------------------------------------------------------------------
        16             1             600K -  700K            4 Gbps
        28             8            5000K - 6000K           32 Gbps
  -------------------------------------------------------------------

FAQ: what is the relation between mprds and mptcp?
  mprds is orthogonal to mptcp. Whereas mptcp creates
  sub-flows for a single TCP connection, mprds parallelizes tx/rx
  at the RDS layer. MPRDS with N paths will allow N datagrams to
  be sent in parallel; each path will continue to send one
  datagram at a time, with sender and receiver keeping track of
  the retransmit and dgram-assembly state based on the RDS header.
  If desired, mptcp can additionally be used to speed up each TCP
  path. That acceleration is orthogonal to the parallelization benefits
  of mprds.

This patch series lays down the foundational data-structures to support
mprds in the kernel. It implements the changes to split up the
rds_connection structure into a common (to all paths) part,
and a per-path rds_conn_path. All I/O workqs are driven from
the rds_conn_path.

Note that this patchset does not (yet) actually enable multipathing
for any of the transports; all transports will continue to use a
single path with the refactored data-structures. A subsequent patchset
will  add the changes to the rds-tcp module to actually use mprds
in rds-tcp.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

161cd45f

RDS: Update rds_conn_destroy to be MP capable · 3ecc5693

Sowmini Varadhan authored Jun 13, 2016

Refactor rds_conn_destroy() so that the per-path dismantling
is done in rds_conn_path_destroy, and then iterate as needed
over rds_conn_path_destroy().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ecc5693

RDS: Update rds_conn_shutdown to work with rds_conn_path · d769ef81

Sowmini Varadhan authored Jun 13, 2016

This commit changes rds_conn_shutdown to take a rds_conn_path *
argument, allowing it to shutdown paths other than c_path[0] for
MP-capable transports.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d769ef81

RDS: Initialize all RDS_MPATH_WORKERS in __rds_conn_create · 1c5113cf

Sowmini Varadhan authored Jun 13, 2016

Add a for() loop in __rds_conn_create to initialize all the
conn_paths, in preparate for MP capable transports.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c5113cf

RDS: Add rds_conn_path_error() · fb1b3dc4

Sowmini Varadhan authored Jun 13, 2016

rds_conn_path_error() is the MP-aware analog of rds_conn_error,
to be used by multipath-capable callers.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb1b3dc4

RDS: update rds-info related functions to traverse multiple conn_paths · 992c9ec5

Sowmini Varadhan authored Jun 13, 2016

This commit updates the callbacks related to the rds-info command
so that they walk through all the rds_conn_path structures and
report the requested info.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

992c9ec5

RDS: Add rds_conn_path_connect_if_down() for MP-aware callers · 3c0a5900

Sowmini Varadhan authored Jun 13, 2016

rds_conn_path_connect_if_down() works on the rds_conn_path
that it is passed. Callers who are not t_m_capable may continue
calling rds_conn_connect_if_down, which will invoke
rds_conn_path_connect_if_down() with the default c_path[0].
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c0a5900

RDS: Make rds_send_pong() take a rds_conn_path argument · 45997e9e

Sowmini Varadhan authored Jun 13, 2016

This commit allows rds_send_pong() callers to send back
the rds pong message on some path other than c_path[0] by
passing in a struct rds_conn_path * argument.  It also
removes the last dependency on the #defines in rds_single.h
from send.c
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45997e9e

RDS: Extract rds_conn_path from i_conn_path in rds_send_drop_to() for MP-capable transports · 01ff34ed

Sowmini Varadhan authored Jun 13, 2016

Explicitly set up rds_conn_path, either from i_conn_path (for
MP capable transpots) or as c_path[0], and use this in
rds_send_drop_to()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01ff34ed

RDS: Pass rds_conn_path to rds_send_xmit() · 1f9ecd7e

Sowmini Varadhan authored Jun 13, 2016

Pass a struct rds_conn_path to rds_send_xmit so that MP capable
transports can transmit packets on something other than c_path[0].
The eventual goal for MP capable transports is to hash the rds
socket to a path based on the bound local address/port, and use
this path as the argument to rds_send_xmit()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f9ecd7e

RDS: Make rds_send_queue_rm() rds_conn_path aware · 780a6d9e

Sowmini Varadhan authored Jun 13, 2016

Pass the rds_conn_path to rds_send_queue_rm, and use it to initialize
the i_conn_path field in struct rds_incoming. This commit also makes
rds_send_queue_rm() MP capable, because it now takes locks
specific to the rds_conn_path passed in, instead of defaulting to
the c_path[0] based defines from rds_single_path.h
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

780a6d9e

RDS: Remove stale function rds_send_get_message() · 7d885d0f

Sowmini Varadhan authored Jun 13, 2016

The only caller of rds_send_get_message() was
rds_iw_send_cq_comp_handler() which was removed as part of
commit dcdede04 ("RDS: Drop stale iWARP RDMA transport"),
so remove rds_send_get_message() for the same reason.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d885d0f

RDS: Add rds_send_path_drop_acked() · 5c3d274c

Sowmini Varadhan authored Jun 13, 2016

rds_send_path_drop_acked() is the path-specific version of
rds_send_drop_acked() to be invoked by MP capable callers.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c3d274c

RDS: Add rds_send_path_reset() · 4e9b551c

Sowmini Varadhan authored Jun 13, 2016

rds_send_path_reset() is the path specific version of rds_send_reset()
intended for MP capable callers.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e9b551c

RDS: rds_inc_path_init() helper function for MP capable transports · 5e833e02

Sowmini Varadhan authored Jun 13, 2016

t_mp_capable transports can use rds_inc_path_init to initialize
all fields in struct rds_incoming, including the i_conn_path.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e833e02

RDS: recv path gets the conn_path from rds_incoming for MP capable transports · ef9e62c2

Sowmini Varadhan authored Jun 13, 2016

Transports that are t_mp_capable should set the rds_conn_path
on which the datagram was recived in the ->i_conn_path field
of struct rds_incoming.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef9e62c2

RDS: add t_mp_capable bit to be set by MP capable transports · 7e8f4413

Sowmini Varadhan authored Jun 13, 2016

The t_mp_capable bit will be used in the core rds module
to support multipathing logic when the transport supports it.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e8f4413

RDS: split out connection specific state from rds_connection to rds_conn_path · 0cb43965

Sowmini Varadhan authored Jun 13, 2016

In preparation for multipath RDS, split the rds_connection
structure into a base structure, and a per-path struct rds_conn_path.
The base structure tracks information and locks common to all
paths. The workqs for send/recv/shutdown etc are tracked per
rds_conn_path. Thus the workq callbacks now work with rds_conn_path.

This commit allows for one rds_conn_path per rds_connection, and will
be extended into multiple conn_paths in subsequent commits.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cb43965

tcp: return sizeof tcp_dctcp_info in dctcp_get_info() · dcf1158b

Neal Cardwell authored Jun 13, 2016

Make sure that dctcp_get_info() returns only the size of the
info->dctcp struct that it zeroes out and fills in. Previously it had
been returning the size of the enclosing tcp_cc_info union,
sizeof(*info).  There is no problem yet, but that union that may one
day be larger than struct tcp_dctcp_info, in which case the
TCP_CC_INFO code might accidentally copy uninitialized bytes from the
stack.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

dcf1158b

sctp: fix error return code in sctp_init() · a5e27d18

Wei Yongjun authored Jun 13, 2016

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5e27d18

Merge tag 'rxrpc-rewrite-20160613' of... · d4c76c1a

David S. Miller authored Jun 14, 2016

Merge tag 'rxrpc-rewrite-20160613' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Rename rxrpc source files

Here's the next part of the AF_RXRPC rewrite.  In this set I rename some of
the files in the net/rxrpc/ directory and adjust the Makefile and
ar-internal.h to reflect the changes.

The aim is twofold:

 (1) Remove the "ar-" prefix on those files that have it as it's not really
     useful, especially now that I'm building rxkad in.

 (2) To aid splitting the local, peer, connection and call handling code
     into separate files for object and event handling in future patches by
     making it easier to come up with new filenames.

There are two commits:

 (1) The first commit does a bunch of renames of .c files and alters the
     Makefile.  ar-internal.h isn't renamed at this time to avoid having to
     change the contents of the files being renamed.

 (2) The second commit changes the section label comments in ar-internal.h
     to reflect the changed filenames and reorders the file so that the
     sections are back in filename order.

The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-rewrite-20160613
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d4c76c1a

net: hns: update the dependency · 4a63538e

Kejian Yan authored Jun 13, 2016

After the patchset about adding support of ACPI (commit id is 63434888)
being applied, HNS does not depend on OF. It depends on OF or ACPI, so
the Kconfig file needs to be updated.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a63538e

Merge branch 'r8152-phy-adjustments' · 7d7549ed

David S. Miller authored Jun 14, 2016

Hayes Wang says:

====================
r8152: code adjustment for PHY

These patches are for adjusting the code about PHY and setting speed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7d7549ed

r8152: save the speed · aa7e26b6

hayeswang authored Jun 13, 2016

The user may change the speed. Use it to replace the default one.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa7e26b6

r8152: move the setting for the default speed · 9d21c0d8

hayeswang authored Jun 13, 2016

Move calling set_speed() from open() to rtl_hw_phy_work_func_t().
Then, we would set the default speed only for first initialization
or after resuming.

Besides, the set_speed() could handle the flag of PHY_RESET which
would be set in rtl_ops.hw_phy_cfg().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d21c0d8

r8152: move the settings of PHY to a work queue · a028a9e0

hayeswang authored Jun 13, 2016

Move the settings of PHY to a work queue and schedule it after
rtl_ops.init().

There are some reasons for this. First, the settings are only
needed for the first time initialization or after the power
down occurs.

Second, the settings are independent with the others.

Last, the settings may take more time than the others. Leave
they in probe() or open() may delay the following flows.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a028a9e0

net/sched: flower: Return error when hw can't offload and skip_sw is set · e8eb36cd

Amir Vadai authored Jun 13, 2016

When skip_sw is set and hardware fails to apply filter, return error to
user. This will make error propagation logic similar to the one
currently used in u32 classifier.
Also, changed code to use tc_skip_sw() utility function.
Signed-off-by: Amir Vadai <amirva@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8eb36cd

14 Jun, 2016 10 commits

Merge branch 'bnxt_en-updates' · ce9355ac

David S. Miller authored Jun 14, 2016

Michael Chan says:

====================
bnxt_en: Updates for net-next.

-Add default VLAN support for VFs.
-Add NPAR (NIC partioning) support.
-Add support for new device 5731x and 5741x. GRO logic is different.
-Support new ETHTOOL_{G|S}LINKSETTINGS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce9355ac

bnxt_en: Support new ETHTOOL_{G|S}LINKSETTINGS API. · 00c04a92

Michael Chan authored Jun 13, 2016

To fully support 25G and 50G link settings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00c04a92

bnxt_en: Don't allow autoneg on cards that don't support it. · 93ed8117

Michael Chan authored Jun 13, 2016

Some cards do not support autoneg.  The current code does not prevent the
user from enabling autoneg with ethtool on such cards, causing confusion.
Firmware provides the autoneg capability information and we just need to
store it in the support_auto_speeds field in bnxt_link_info struct.
The ethtool set_settings() call will check this field before proceeding
with autoneg.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93ed8117

bnxt_en: Add BCM5731X and BCM5741X device IDs. · b24eb6ae

Michael Chan authored Jun 13, 2016

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b24eb6ae

bnxt_en: Add GRO logic for BCM5731X chips. · 94758f8d

Michael Chan authored Jun 13, 2016

Add bnxt_gro_func_5731x() to handle GRO packets for this chip. The
completion structures used in the new chip have new data to help determine
the header offsets. The offsets can be off by 4 if the packet is an
internal loopback packet (e.g. from one VF to another VF). Some additional
logic is added to adjust the offsets if it is a loopback packet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94758f8d

bnxt_en: Refactor bnxt_gro_skb(). · 309369c9

Michael Chan authored Jun 13, 2016

Newer chips require different logic to handle GRO packets. So refactor
the code so that we can call different functions depending on the chip.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

309369c9

bnxt_en: Define the supported chip numbers. · 659c805c

Michael Chan authored Jun 13, 2016

Define all the supported chip numbers and chip categories.  Store the
chip_num returned by firmware.  If the call to get the version and chip
number fails, we should abort.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

659c805c

bnxt_en: Add PCI device ID for 57404 NPAR devices. · ebcd4eeb

Michael Chan authored Jun 13, 2016

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebcd4eeb

bnxt_en: Enable NPAR (NIC Partitioning) Support. · 567b2abe

Satish Baddipadige authored Jun 13, 2016

NPAR type is read from bnxt_hwrm_func_qcfg.  Do not allow changing link
parameters if in NPAR mode sinc ethe port is shared among multiple
partitions.  The link parameters are set up by firmware.
Signed-off-by: Satish Baddipadige <sbaddipa@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

567b2abe

bnxt_en: Handle VF_CFG_CHANGE event from firmware. · fc0f1929

Michael Chan authored Jun 13, 2016

When the VF driver gets this event, the VF configuration has changed (such
as default VLAN).  The VF driver will initiate a silent reset to pick up
the new configuration.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc0f1929