Commits · c28445c3cb07ba1da2c1dc7b5f3066c686a6acc6 · Kirill Smelkov / linux

18 Jan, 2017 13 commits

sctp: add reconf_enable in asoc ep and netns · c28445c3

Xin Long authored Jan 18, 2017

This patch is to add reconf_enable field in all of asoc ep and netns
to indicate if they support stream reset.

When initializing, asoc reconf_enable get the default value from ep
reconf_enable which is from netns netns reconf_enable by default.

It is also to add reconf_capable in asoc peer part to know if peer
supports reconf_enable, the value is set if ext params have reconf
chunk support when processing init chunk, just as rfc6525 section
5.1.1 demands.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c28445c3

sctp: add stream reconf primitive · 7a090b04

Xin Long authored Jan 18, 2017

This patch is to add a primitive based on sctp primitive frame for
sending stream reconf request. It works as the other primitives,
and create a SCTP_CMD_REPLY command to send the request chunk out.

sctp_primitive_RECONF would be the api to send a reconf request
chunk.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a090b04

sctp: add stream reconf timer · 7b9438de

Xin Long authored Jan 18, 2017

This patch is to add a per transport timer based on sctp timer frame
for stream reconf chunk retransmission. It would start after sending
a reconf request chunk, and stop after receiving the response chunk.

If the timer expires, besides retransmitting the reconf request chunk,
it would also do the same thing with data RTO timer. like to increase
the appropriate error counts, and perform threshold management, possibly
destroying the asoc if sctp retransmission thresholds are exceeded, just
as section 5.1.1 describes.

This patch is also to add asoc strreset_chunk, it is used to save the
reconf request chunk, so that it can be retransmitted, and to check if
the response is really for this request by comparing the information
inside with the response chunk as well.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b9438de

sctp: add support for generating stream reconf ssn reset request chunk · cc16f00f

Xin Long authored Jan 18, 2017

This patch is to add asoc strreset_outseq and strreset_inseq for
saving the reconf request sequence, initialize them when create
assoc and process init, and also to define Incoming and Outgoing
SSN Reset Request Parameter described in rfc6525 section 4.1 and
4.2, As they can be in one same chunk as section rfc6525 3.1-3
describes, it makes them in one function.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc16f00f

Merge branch 'rework-inet_csk_get_port' · b16ed2b1

David S. Miller authored Jan 18, 2017

Josef Bacik says:

====================
Rework inet_csk_get_port

V3->V4:
-Removed the random include of addrconf.h that is no longer needed.

V2->V3:
-Dropped the fastsock from the tb and instead just carry the saddrs, family, and
 ipv6 only flag.
-Reworked the helper functions to deal with this change so I could still use
 them when checking the fast path.
-Killed tb->num_owners as per Eric's request.
-Attached a reproducer to the bottom of this email.

V1->V2:
-Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one'
 at Hannes' suggestion.
-Dropped ->bind_conflict and just use the new helper.
-Fixed a compile bug from the original ->bind_conflict patch.

The original description of the series follows:

At some point recently the guys working on our load balancer added the ability
to use SO_REUSEPORT.  When they restarted their app with this option enabled
they immediately hit a softlockup on what appeared to be the
inet_bind_bucket->lock.  Eventually what all of our debugging and discussion led
us to was the fact that the application comes up without SO_REUSEPORT, shuts
down which creates around 100k twsk's, and then comes up and tries to open a
bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket
owners list under the lock.  Since this lock is needed for dealing with the
twsk's and basically anything else related to connections we would softlockup,
and sometimes not ever recover.

To solve this problem I did what you see in Path 5/5.  Once we have a
SO_REUSEPORT socket on the tb->owners list we know that the socket has no
conflicts with any of the other sockets on that list.  So we can add a copy of
the sock_common (really all we need is the recv_saddr but it seemed ugly to copy
just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've
copied the whole common) in order to check subsequent SO_REUSEPORT sockets.  If
they match the previous one then we can skip the expensive
inet_csk_bind_conflict check.  This is what eliminated the soft lockup that we
were seeing.

Patches 1-4 are cleanups and re-workings.  For instance when we specify port ==
0 we need to find an open port, but we would do two passes through
inet_csk_bind_conflict every time we found a possible port.  We would also keep
track of the smallest_port value in order to try and use it if we found no
port our first run through.  This however made no sense as it would have had to
fail the first pass through inet_csk_bind_conflict, so would not actually pass
the second pass through either.  Finally I split the function into two functions
in order to make it easier to read and to distinguish between the two behaviors.

I have tested this on one of our load balancing boxes during peak traffic and it
hasn't fallen over.  But this is not my area, so obviously feel free to point
out where I'm being stupid and I'll get it fixed up and retested.  Thanks,
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b16ed2b1

inet: reset tb->fastreuseport when adding a reuseport sk · 637bc8bb

Josef Bacik authored Jan 17, 2017

If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
never set it again. Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add the ipv4/ipv6 saddr fields to the bind bucket, as well as the family
so we know what comparison to make, and the ipv6 only setting so we can make
sure to compare with new sockets appropriately. Once one sk has made it onto
the list we know that there are no potential bind conflicts on the owners list
that match that sk's rcv_addr. So copy the sk's information into our bind
bucket and set tb->fastruseport to FASTREUSESOCK_STRICT so we know we have to do
an extra check for subsequent reuseport sockets and skip the expensive bind
conflict check.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

637bc8bb

inet: split inet_csk_get_port into two functions · 289141b7

Josef Bacik authored Jan 17, 2017

inet_csk_get_port does two different things, it either scans for an open port,
or it tries to see if the specified port is available for use. Since these two
operations have different rules and are basically independent lets split them
into two different functions to make them both more readable.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

289141b7

inet: don't check for bind conflicts twice when searching for a port · 6cd66616

Josef Bacik authored Jan 17, 2017

This is just wasted time, we've already found a tb that doesn't have a bind
conflict, and we don't drop the head lock so scanning again isn't going to give
us a different answer. Instead move the tb->reuse setting logic outside of the
found_tb path and put it in the success: path. Then make it so that we don't
goto again if we find a bind conflict in the found_tb path as we won't reach
this anymore when we are scanning for an ephemeral port.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cd66616

inet: kill smallest_size and smallest_port · b9470c27

Josef Bacik authored Jan 17, 2017

In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
	port = smallest_port;
	goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9470c27

inet: drop ->bind_conflict · aa078842

Josef Bacik authored Jan 17, 2017

The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict
is how they check the rcv_saddr, so delete this call back and simply
change inet_csk_bind_conflict to call inet_rcv_saddr_equal.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa078842

inet: collapse ipv4/v6 rcv_saddr_equal functions into one · fe38d2a1

Josef Bacik authored Jan 17, 2017

We pass these per-protocol equal functions around in various places, but
we can just have one function that checks the sk->sk_family and then do
the right comparison function.  I've also changed the ipv4 version to
not cast to inet_sock since it is unneeded.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe38d2a1

stmicro: add more information to Kconfig · ab70e586

jpinto authored Jan 17, 2017

This patch adds more info to stmicro' Kconfig files in order to be clearer
that the driver can be used by ethernet cards based on 10/100/1000/EQOS
Synopsys IP Cores.

EQOS was also added stmmac/Kconfig Kconfig, since dwmac4 is in fact EQoS,
one of Synopsys Ethernet IPs. More info at:
https://www.synopsys.com/dw/ipdir.php?ds=dwc_ether_qosSigned-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab70e586

net/mlx5e: Support bpf_xdp_adjust_head() · d8bec2b2

Martin KaFai Lau authored Jan 17, 2017

This patch adds bpf_xdp_adjust_head() support to mlx5e.

1. rx_headroom is added to struct mlx5e_rq.  It uses
   an existing 4 byte hole in the struct.
2. The adjusted data length is checked against
   MLX5E_XDP_MIN_INLINE and MLX5E_SW2HW_MTU(rq->netdev->mtu).
3. The macro MLX5E_SW2HW_MTU is moved from en_main.c to en.h.
   MLX5E_HW2SW_MTU is also moved to en.h for symmetric reason
   but it is not a must.

v2:
- Keep the xdp specific logic in mlx5e_xdp_handle()
- Update dma_len after the sanity checks in mlx5e_xmit_xdp_frame()
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d8bec2b2

17 Jan, 2017 27 commits

tcp: accept RST for rcv_nxt - 1 after receiving a FIN · 0e40f4c9

Jason Baron authored Jan 17, 2017

Using a Mac OSX box as a client connecting to a Linux server, we have found
that when certain applications (such as 'ab'), are abruptly terminated
(via ^C), a FIN is sent followed by a RST packet on tcp connections. The
FIN is accepted by the Linux stack but the RST is sent with the same
sequence number as the FIN, and Linux responds with a challenge ACK per
RFC 5961. The OSX client then sometimes (they are rate-limited) does not
reply with any RST as would be expected on a closed socket.

This results in sockets accumulating on the Linux server left mostly in
the CLOSE_WAIT state, although LAST_ACK and CLOSING are also possible.
This sequence of events can tie up a lot of resources on the Linux server
since there may be a lot of data in write buffers at the time of the RST.
Accepting a RST equal to rcv_nxt - 1, after we have already successfully
processed a FIN, has made a significant difference for us in practice, by
freeing up unneeded resources in a more expedient fashion.

A packetdrill test demonstrating the behavior:

// testing mac osx rst behavior

// Establish a connection
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32768 <mss 1460,nop,wscale 10>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 5>
0.200 < . 1:1(0) ack 1 win 32768
0.200 accept(3, ..., ...) = 4

// Client closes the connection
0.300 < F. 1:1(0) ack 1 win 32768

// now send rst with same sequence
0.300 < R. 1:1(0) ack 1 win 32768

// make sure we are in TCP_CLOSE
0.400 %{
assert tcpi_state == 7
}%
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e40f4c9

net: ethoc: Make needlessly global struct ethtool_ops static · a870a977

Tobias Klauser authored Jan 17, 2017

Make the needlessly global struct ethtool_ops ethoc_ethtool_ops static
to fix a sparse warning.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

a870a977

Merge branch 'sfc-RX-hash-config' · 49a67b5d

David S. Miller authored Jan 17, 2017

Edward Cree says:

====================
sfc: RX hash configuration

This series improves support for getting and setting RX hashing
 configuration on Solarflare adapters through ethtool.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

49a67b5d

sfc: read back RX hash config from the NIC when querying it with ethtool -x · a707d188

Edward Cree authored Jan 17, 2017

Ensures that we report the key and indirection table the NIC is using,
 rather than (if setting them failed earlier) what we wanted it to use.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a707d188

sfc: support setting RSS hash key through ethtool API · f74d1995

Edward Cree authored Jan 17, 2017

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f74d1995

cxgb4: Implement ndo_get_phys_port_id for mgmt dev · 96fe11f2

Ganesh Goudar authored Jan 17, 2017

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96fe11f2

net: ping: Use right format specifier to avoid type casting · a7ef6715

Gao Feng authored Jan 17, 2017

The inet_num is u16, so use %hu instead of casting it to int. And
the sk_bound_dev_if is int actually, so it needn't cast to int.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7ef6715

qed: Replace memset with eth_zero_addr · 0ee28e31

Shyam Saini authored Jan 17, 2017

Use eth_zero_addr to assign zero address to the given address array
instead of memset when the second argument in memset is address
of zero. Also, it makes the code clearer
Signed-off-by: Shyam Saini <mayhs11saini@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ee28e31

bridge: sparse fixes in br_ip6_multicast_alloc_query() · 53631a5f

Lance Richardson authored Jan 16, 2017

Changed type of csum field in struct igmpv3_query from __be16 to
__sum16 to eliminate type warning, made same change in struct
igmpv3_report for consistency.

Fixed up an ntohs() where htons() should have been used instead.
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

53631a5f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 580bdf56
David S. Miller authored Jan 17, 2017

580bdf56

Merge branch 'mpls-packet-stats' · e60a4263

David S. Miller authored Jan 17, 2017

Robert Shearman says:

====================
mpls: Packet stats

This patchset records per-interface packet stats in the MPLS
forwarding path and exports them using a nest of attributes root at a
new IFLA_STATS_AF_SPEC attribute as part of RTM_GETSTATS messages:

[IFLA_STATS_AF_SPEC]
 -> [AF_MPLS]
  -> [MPLS_STATS_LINK]
   -> struct mpls_link_stats

The first patch adds the rtnl infrastructure for this, including a new
callbacks to per-AF ops of fill_stats_af and get_stats_af_size. The
second patch records MPLS stats and makes use of the infrastructure to
export them. The rtnl infrastructure could also be used to export IPv6
stats in the future.

Changes in v2:
 - make incrementing IPv6 stats in mpls_stats_inc_outucastpkts
   conditional on CONFIG_IPV6 to fix build with CONFIG_IPV6=n
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e60a4263

mpls: Packet stats · 27d69105

Robert Shearman authored Jan 16, 2017

Having MPLS packet stats is useful for observing network operation and
for diagnosing network problems. In the absence of anything better,
RFC2863 and RFC3813 are used for guidance for which stats to expose
and the semantics of them. In particular rx_noroutes maps to in
unknown protos in RFC2863. The stats are exposed to userspace via
AF_MPLS attributes embedded in the IFLA_STATS_AF_SPEC attribute of
RTM_GETSTATS messages.

All the introduced fields are 64-bit, even error ones, to ensure no
overflow with long uptimes. Per-CPU counters are used to avoid
cache-line contention on the commonly used fields. The other fields
have also been made per-CPU for code to avoid performance problems in
error conditions on the assumption that on some platforms the cost of
atomic operations could be more expensive than sending the packet
(which is what would be done in the success case). If that's not the
case, we could instead not use per-CPU counters for these fields.

Only unicast and non-fragment are exposed at the moment, but other
counters can be exposed in the future either by adding to the end of
struct mpls_link_stats or by additional netlink attributes in the
AF_MPLS IFLA_STATS_AF_SPEC nested attribute.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27d69105

net: AF-specific RTM_GETSTATS attributes · aefb4d4a

Robert Shearman authored Jan 16, 2017

Add the functionality for including address-family-specific per-link
stats in RTM_GETSTATS messages. This is done through adding a new
IFLA_STATS_AF_SPEC attribute under which address family attributes are
nested and then the AF-specific attributes can be further nested. This
follows the model of IFLA_AF_SPEC on RTM_*LINK messages and it has the
advantage of presenting an easily extended hierarchy. The rtnl_af_ops
structure is extended to provide AFs with the opportunity to fill and
provide the size of their stats attributes.

One alternative would have been to provide AFs with the ability to add
attributes directly into the RTM_GETSTATS message without a nested
hierarchy. I discounted this approach as it increases the rate at
which the 32 attribute number space is used up and it makes
implementation a little more tricky for stats dump resuming (at the
moment the order in which attributes are added to the message has to
match the numeric order of the attributes).

Another alternative would have been to register per-AF RTM_GETSTATS
handlers. I discounted this approach as I perceived a common use-case
to be getting all the stats for an interface and this approach would
necessitate multiple requests/dumps to retrieve them all.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aefb4d4a

stmmac: add missing of_node_put · a249708b

Julia Lawall authored Jan 17, 2017

The function stmmac_dt_phy provides several possibilities for initializing
plat->mdio_node, all of which have the effect of increasing the reference
count of the assigned value. This field is not updated elsewhere, so the
value is live until the end of the lifetime of plat (devm_allocated), just
after the end of stmmac_remove_config_dt. Thus, add an of_node_put on
plat->mdio_node in stmmac_remove_config_dt. It is possible that the field
mdio_node is never initialized, but of_node_put is NULL-safe, so it is also
safe to call of_node_put in that case.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a249708b

virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit · 501db511

Rolf Neugebauer authored Jan 17, 2017

This patch part reverts fd2a0437 and e858fae2 which introduced a
subtle change in how the virtio_net flags are derived from the SKBs
ip_summed field.

With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID
when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to
ip_summed == CHECKSUM_NONE, which should be the same.

Further, the virtio spec 1.0 / CS04 explicitly says that
VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver.

Fixes: fd2a0437 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Fixes: e858fae2 (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

501db511

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4b19a9e2

Linus Torvalds authored Jan 17, 2017

Pull networking fixes from David Miller:

1) Handle multicast packets properly in fast-RX path of mac80211, from
Johannes Berg.

2) Because of a logic bug, the user can't actually force SW
checksumming on r8152 devices. This makes diagnosis of hw
checksumming bugs really annoying. Fix from Hayes Wang.

3) VXLAN route lookup does not take the source and destination ports
into account, which means IPSEC policies cannot be matched properly.
Fix from Martynas Pumputis.

4) Do proper RCU locking in netvsc callbacks, from Stephen Hemminger.

5) Fix SKB leaks in mlxsw driver, from Arkadi Sharshevsky.

6) If lwtunnel_fill_encap() fails, we do not abort the netlink message
construction properly in fib_dump_info(), from David Ahern.

7) Do not use kernel stack for DMA buffers in atusb driver, from Stefan
Schmidt.

8) Openvswitch conntack actions need to maintain a correct checksum,
fix from Lance Richardson.

9) ax25_disconnect() is missing a check for ax25->sk being NULL, in
fact it already checks this, but not in all of the necessary spots.
Fix from Basil Gunn.

10) Action GET operations in the packet scheduler can erroneously bump
the reference count of the entry, making it unreleasable. Fix from
Jamal Hadi Salim. Jamal gives a great set of example command lines
that trigger this in the commit message.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
net sched actions: fix refcnt when GETing of action after bind
net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
net/mlx4_core: Fix racy CQ (Completion Queue) free
net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered
net/mlx5e: Fix a -Wmaybe-uninitialized warning
ax25: Fix segfault after sock connection timeout
bpf: rework prog_digest into prog_tag
tipc: allocate user memory with GFP_KERNEL flag
net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types
ip6_tunnel: Account for tunnel header in tunnel MTU
mld: do not remove mld souce list info when set link down
be2net: fix MAC addr setting on privileged BE3 VFs
be2net: don't delete MAC on close on unprivileged BE3 VFs
be2net: fix status check in be_cmd_pmac_add()
cpmac: remove hopeless #warning
ravb: do not use zero-length alignment DMA descriptor
mlx4: do not call napi_schedule() without care
openvswitch: maintain correct checksum state in conntrack actions
tcp: fix tcp_fastopen unaligned access complaints on sparc
...

4b19a9e2

Merge branch 'stable/for-linus-4.10' of... · 203f80f1

Linus Torvalds authored Jan 17, 2017

Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb

Pull swiotlb fix from Konrad Rzeszutek Wilk:
 "A tiny fix to make sure that page-sized mappings are page-aligned (and
  not say straddle two pages). This is important for some drivers (such
  as NVME)"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: ensure that page-sized mappings are page-aligned

203f80f1

Merge tag 'mmc-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 7e84b303

Linus Torvalds authored Jan 17, 2017

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - fix regressions detecting HS/HS DDR eMMC cards related to CMD6

  MMC host:
   - mmc: mxs-mmc: Fix additional cycles after transmission stop
   - sdhci-acpi: Only powered up enabled acpi child devices
   - meson: avoid possible NULL dereference"

* tag 'mmc-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Restore parts of the polling policy when switch to HS/HS DDR
  mmc: mxs-mmc: Fix additional cycles after transmission stop
  mmc: sdhci-acpi: Only powered up enabled acpi child devices
  MMC: meson: avoid possible NULL dereference

7e84b303

Merge tag 'for-linus-20170116' of git://git.infradead.org/linux-mtd · 7d8b8c09

Linus Torvalds authored Jan 17, 2017

Pull MTD fixes from Brian Norris:
 "Just NAND updates from Boris:

   - avoid compiling xway NAND controller driver as a module (which
     didn't work)

   - fix tango NAND DT binding and make sure the controller is in a
     clean state at probe time

   - add dependency on HAS_IOMEM to the oxnas NAND driver

   - fix irq number validity check in the lpc32xx driver"

* tag 'for-linus-20170116' of git://git.infradead.org/linux-mtd:
  mtd: nand: lpc32xx: fix invalid error handling of a requested irq
  mtd: nand: tango: Reset pbus to raw mode in probe
  mtd: nand: tango: Update DT binding description
  mtd: nand: oxnas_nand: fix build errors on arch/um, require HAS_IOMEM
  mtd: nand: xway: fix build because of module functions
  mtd: nand: xway: disable module support

7d8b8c09

net: marvell: sky2: use new api ethtool_{get|set}_link_ksettings · 55f78fcd

Philippe Reynes authored Jan 14, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55f78fcd

net: marvell: skge: use new api ethtool_{get|set}_link_ksettings · 0f826385

Philippe Reynes authored Jan 16, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

The callback set_link_ksettings no longer update the value
of advertising, as the struct ethtool_link_ksettings is
defined as const.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f826385

net: jme: use new api ethtool_{get|set}_link_ksettings · c523838c

Philippe Reynes authored Jan 15, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c523838c

net: korina: use new api ethtool_{get|set}_link_ksettings · af473688

Philippe Reynes authored Jan 14, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af473688

Merge branch 'mvneta-xmit_more-bql' · b8128c42

David S. Miller authored Jan 16, 2017

Marcin Wojtas says:

====================
mvneta xmit_more and bql support

This is a delayed v2 of short patchset, which introduces xmit_more and BQL
to mvneta driver. The only one change was added in xmit_more support -
condition check preventing excessive descriptors concatenation before
flushing in HW.

Any comments or feedback would be welcome.

Changelog:
v1 -> v2:

* Add checking condition that ensures too much descriptors are not
  concatenated before flushing in HW.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b8128c42

net: mvneta: add BQL support · a29b6235

Marcin Wojtas authored Jan 16, 2017

Tests showed that when whole bandwidth is consumed, the latency for
various kind of traffic can reach high values. With saturated
link (e.g. with iperf from target to host) simple ping could take
significant amount of time. BQL proved to improve this situation
when implemented in mvneta driver. Measurements of ping latency
for 3 link speeds:
Speed | Latency w/o BQL | Latency with BQL
10    |      7-14 ms    |     3.5 ms
100   |      2-12 ms    |     0.6 ms
1000  |   often timeout |   up to 2ms

Decreasing latency as above result in sligt performance cost - 4kpps
(-1.4%) when pushing 64B packets via two bridged interfaces of Armada 38x.
For 1500B packets in the same setup, the mpstat tool showed +8% of
CPU occupation (default affinity, second CPU idle). Even though this
cost seems reasonable to take, considering other improvements.

This commit adds byte queue limit mechanism for the mvneta driver.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a29b6235

net: mvneta: add xmit_more support · 2a90f7e1

Simon Guinot authored Jan 16, 2017

Basing on xmit_more flag of the skb, TX descriptors can be concatenated
before flushing. This commit delay Tx descriptor flush if the queue is
running and if there is more skb's to send.

A maximum allowed number of descriptors for flushing at once due to
MVNETA_TXQ_UPDATE_REG(q) reqisters limitation, is 255. Because of that
a new macro was added (MVNETA_TXQ_DEC_SENT_MASK) in order to ensure that
concatenated amount of descriptor does not exceed that value.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a90f7e1

net sched actions: fix refcnt when GETing of action after bind · 0faa9cb5

Jamal Hadi Salim authored Jan 15, 2017

Demonstrating the issue:

.. add a drop action
$sudo $TC actions add action drop index 10

.. retrieve it
$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 2 bind 0 installed 29 sec used 29 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

... bug 1 above: reference is two.
    Reference is actually 1 but we forget to subtract 1.

... do a GET again and we see the same issue
    try a few times and nothing changes
~$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 2 bind 0 installed 31 sec used 31 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

... lets try to bind the action to a filter..
$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
  u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

... and now a few GETs:
$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 3 bind 1 installed 204 sec used 204 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 4 bind 1 installed 206 sec used 206 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 5 bind 1 installed 235 sec used 235 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

.... as can be observed the reference count keeps going up.

After the fix

$ sudo $TC actions add action drop index 10
$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 1 bind 0 installed 4 sec used 4 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 1 bind 0 installed 6 sec used 6 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
  u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 2 bind 1 installed 32 sec used 32 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

	action order 1: gact action drop
	 random type none pass val 0
	 index 10 ref 2 bind 1 installed 33 sec used 33 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

Fixes: aecc5cef ("net sched actions: fix GETing actions")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0faa9cb5