Commits · be7ccdc36ba4815ca71b0ac6df898237a912b3ac · Kirill Smelkov / linux

05 Oct, 2015 10 commits

Arnd Bergmann authored Sep 30, 2015

The fec_ptp_enable_pps uses an open-coded implementation of ns_to_timespec,
which will be removed eventually as it is not y2038-safe on 32-bit
architectures. Two more instances of the same code in this file were
already converted to use the safe ns_to_timespec64 in commit 6630514f
("ptp: fec: use helpers for converting ns to timespec"), this changes
the last one as well.

The seconds portion here is actually unused and we could just remove the
timespec variable, but using ns_to_timespec64 can still be better as the
implementation can be hand-optimized in the future.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Fugang Duan <b38611@freescale.com>
Cc: Luwei Zhou <b45643@freescale.com>
Cc: Frank Li <Frank.Li@freescale.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be7ccdc3

Merge branch 'ipv4-multipath-hash' · 07355737

David S. Miller authored Oct 05, 2015

Peter Nørlund says:

====================
ipv4: Hash-based multipath routing

When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed
from more or less being destination-based into being quasi-random per-packet
scheduling. This increases the risk of out-of-order packets and makes it
impossible to use multipath together with anycast services.

This patch series replaces the old implementation with flow-based load
balancing based on a hash over the source and destination addresses.

Distribution of the hash is done with thresholds as described in RFC 2992.
This reduces the disruption when a path is added/remove when having more than
two paths.

To futher the chance of successful usage in conjuction with anycast, ICMP
error packets are hashed over the inner IP addresses. This ensures that PMTU
will work together with anycast or load-balancers such as IPVS.

Port numbers are not considered since fragments could cause problems with
anycast and IPVS. Relying on the DF-flag for TCP packets is also insufficient,
since ICMP inspection effectively extracts information from the opposite
flow which might have a different state of the DF-flag. This is also why the
RSS hash is not used. These are typically based on the NDIS RSS spec which
mandates TCP support.

Measurements of the additional overhead of a two-path multipath
(p_mkroute_input excl. __mkroute_input) on a Xeon X3550 (4 cores, 2.66GHz):

Original per-packet: ~394 cycles/packet
L3 hash:              ~76 cycles/packet

Changes in v5:
- Fixed compilation error

Changes in v4:
- Functions take hash directly instead of func ptr
- Added inline hash function
- Added dummy macros to minimize ifdefs
- Use upper 31 bits of hash instead of lower

Changes in v3:
- Multipath algorithm is no longer configurable (always L3)
- Added random seed to hash
- Moved ICMP inspection to isolated function
- Ignore source quench packets (deprecated as per RFC 6633)

Changes in v2:
- Replaced 8-bit xor hash with 31-bit jenkins hash
- Don't scale weights (since 31-bit)
- Avoided unnecesary renaming of variables
- Rely on DF-bit instead of fragment offset when checking for fragmentation
- upper_bound is now inclusive to avoid overflow
- Use a callback to postpone extracting flow information until necessary
- Skipped ICMP inspection entirely with L4 hashing
- Handle newly added sysctl ignore_routes_with_linkdown
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

07355737

ipv4: ICMP packet inspection for multipath · 79a13159

Peter Nørlund authored Sep 30, 2015

ICMP packets are inspected to let them route together with the flow they
belong to, minimizing the chance that a problematic path will affect flows
on other paths, and so that anycast environments can work with ECMP.
Signed-off-by: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79a13159

ipv4: L3 hash-based multipath · 0e884c78

Peter Nørlund authored Sep 30, 2015

Replaces the per-packet multipath with a hash-based multipath using
source and destination address.
Signed-off-by: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e884c78

Merge branch 'tcp-listener-fixes-and-improvement' · 2472186f

David S. Miller authored Oct 05, 2015

Eric Dumazet says:

====================
tcp: lockless listener fixes and improvement

This fixes issues with TCP FastOpen vs lockless listeners,
and SYNACK being attached to request sockets.

Then, last patch brings performance improvement for
syncookies generation and validation.

Tested under a 4.3 Mpps SYNFLOOD attack, new perf profile looks
like :
    12.11%  [kernel]  [k] sha_transform
     5.83%  [kernel]  [k] tcp_conn_request
     4.59%  [kernel]  [k] __inet_lookup_listener
     4.11%  [kernel]  [k] ipt_do_table
     3.91%  [kernel]  [k] tcp_make_synack
     3.05%  [kernel]  [k] fib_table_lookup
     2.74%  [kernel]  [k] sock_wfree
     2.66%  [kernel]  [k] memcpy_erms
     2.12%  [kernel]  [k] tcp_v4_rcv
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2472186f

tcp: avoid two atomic ops for syncookies · a1a5344d

Eric Dumazet authored Oct 04, 2015

inet_reqsk_alloc() is used to allocate a temporary request
in order to generate a SYNACK with a cookie. Then later,
syncookie validation also uses a temporary request.

These paths already took a reference on listener refcount,
we can avoid a couple of atomic operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1a5344d

net: use sk_fullsock() in __netdev_pick_tx() · 004a5d01

Eric Dumazet authored Oct 04, 2015

SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a
sk_dst_cache pointer.

Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

004a5d01

ipv6: inet6_sk() should use sk_fullsock() · e7eadb4d

Eric Dumazet authored Oct 04, 2015

SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6
pointer.

Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7eadb4d

inet: ip_skb_dst_mtu() should use sk_fullsock() · caf3f267

Eric Dumazet authored Oct 04, 2015

SYN_RECV & TIMEWAIT sockets are not full blown,
do not even try to call ip_sk_use_pmtu() on them.

Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caf3f267

tcp: fix fastopen races vs lockless listener · 7656d842

Eric Dumazet authored Oct 04, 2015

There are multiple races that need fixes :

1) skb_get() + queue skb + kfree_skb() is racy

An accept() can be done on another cpu, data consumed immediately.
tcp_recvmsg() uses __kfree_skb() as it is assumed all skb found in
socket receive queue are private.

Then the kfree_skb() in tcp_rcv_state_process() uses an already freed skb

2) tcp_reqsk_record_syn() needs to be done before tcp_try_fastopen()
for the same reasons.

3) We want to send the SYNACK before queueing child into accept queue,
otherwise we might reintroduce the ooo issue fixed in
commit 7c85af88 ("tcp: avoid reorders for TFO passive connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7656d842

04 Oct, 2015 26 commits

Merge branch 'bridge-netlink' · 3e087caa

David S. Miller authored Oct 04, 2015

Nikolay Aleksandrov says:

====================
bridge: complete netlink support

This set completes the bridge device's netlink support and makes it
possible to view and configure everything that can be configured via
sysfs. I have tested all of these (setting and getting). There're a few
longer line warnings about the br_get_size() ifla comments but I think we
should have them to know what has been accounted for. I have used the sysfs
interface as a guide of what and how to set. As usual I'll send the
corresponding iproute2 patches later.
The bridge port's netlink interface will be completed after this set gets
applied in some form.

This patch-set is on top of my last vlan cleanups set:
http://www.spinics.net/lists/netdev/msg346005.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3e087caa

bridge: netlink: add support for default_pvid · 0f963b75

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_VLAN_DEFAULT_PVID to allow setting/getting bridge's
default_pvid via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f963b75

bridge: netlink: add support for netfilter tables config · 93870cc0

Nikolay Aleksandrov authored Oct 04, 2015

Add support to allow getting/setting netfilter tables settings.
Currently these are IFLA_BR_NF_CALL_IPTABLES, IFLA_BR_NF_CALL_IP6TABLES
and IFLA_BR_NF_CALL_ARPTABLES.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93870cc0

bridge: netlink: add support for igmp's intervals · 7e4df51e

Nikolay Aleksandrov authored Oct 04, 2015

Add support to set/get all of the igmp's configurable intervals via
netlink. These currently are:
IFLA_BR_MCAST_LAST_MEMBER_INTVL
IFLA_BR_MCAST_MEMBERSHIP_INTVL
IFLA_BR_MCAST_QUERIER_INTVL
IFLA_BR_MCAST_QUERY_INTVL
IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
IFLA_BR_MCAST_STARTUP_QUERY_INTVL
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e4df51e

bridge: netlink: add support for multicast_startup_query_count · b89e6bab

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_STARTUP_QUERY_CNT to allow setting/getting
br->multicast_startup_query_count via netlink. Also align the ifla
comments.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b89e6bab

bridge: netlink: add support for multicast_last_member_count · 79b859f5

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_LAST_MEMBER_CNT to allow setting/getting
br->multicast_last_member_count via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79b859f5

bridge: netlink: add support for igmp's hash_max · 858079fd

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_HASH_MAX to allow setting/getting br->hash_max via
netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

858079fd

bridge: netlink: add support for igmp's hash_elasticity · 431db3c0

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_HASH_ELASTICITY to allow setting/getting
br->hash_elasticity via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

431db3c0

bridge: netlink: add support for multicast_querier · ba062d7c

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_QUERIER to allow setting/getting br->multicast_querier
via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba062d7c

bridge: netlink: add support for multicast_query_use_ifaddr · 295141d9

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_QUERY_USE_IFADDR to allow setting/getting
br->multicast_query_use_ifaddr via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

295141d9

bridge: netlink: add support for multicast_snooping · 89126327

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_SNOOPING to allow enabling/disabling multicast
snooping via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

89126327

bridge: netlink: add support for multicast_router · a9a6bc70

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_MCAST_ROUTER to allow setting and retrieving
br->multicast_router when igmp snooping is enabled.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9a6bc70

bridge: netlink: add fdb flush · 150217c6

Nikolay Aleksandrov authored Oct 04, 2015

Simple attribute that flushes the bridge's fdb.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

150217c6

bridge: netlink: add group_addr support · 111189ab

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_GROUP_ADDR attribute to allow setting and retrieving the
group_addr via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

111189ab

bridge: netlink: export all timers · d76bd14e

Nikolay Aleksandrov authored Oct 04, 2015

Export the following bridge timers (also exported via sysfs):
IFLA_BR_HELLO_TIMER, IFLA_BR_TCN_TIMER, IFLA_BR_TOPOLOGY_CHANGE_TIMER,
IFLA_BR_GC_TIMER via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d76bd14e

bridge: netlink: export topology_change and topology_change_detected · ed416309

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_TOPOLOGY_CHANGE and IFLA_BR_TOPOLOGY_CHANGE_DETECTED and
export them via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed416309

bridge: netlink: export root path cost · 684dd248

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_ROOT_PATH_COST and export it via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

684dd248

bridge: netlink: export root port · 8762ba68

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_ROOT_PORT and export it via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8762ba68

bridge: netlink: export bridge id · 7599a220

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_BRIDGE_ID and export br->bridge_id via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7599a220

bridge: netlink: export root id · 5127c81f

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_ROOT_ID and export br->designated_root via netlink. For this
purpose add struct ifla_bridge_id that would represent struct bridge_id.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5127c81f

bridge: netlink: add group_fwd_mask support · 7910228b

Nikolay Aleksandrov authored Oct 04, 2015

Add IFLA_BR_GROUP_FWD_MASK attribute to allow setting and retrieving the
group_fwd_mask via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7910228b

Merge branch 'bridge-vlan' · 68d4e520

David S. Miller authored Oct 04, 2015

Nikolay Aleksandrov says:

====================
bridge: vlan: cleanups & fixes (part 2)

This is the second follow-up set with one fix (patch 01) and more cleanups
(patches 02,03 and 04). These are minor compared to the previous ones and
should be the last before taking on the optimization changes on the
fast-path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

68d4e520

bridge: vlan: use br_vlan_should_use to simplify __vlan_add/del · 6be144f6

Nikolay Aleksandrov authored Oct 02, 2015

The checks that lead to num_vlans change are always what
br_vlan_should_use checks for, namely if the vlan is only a context or
not and depending on that it's either not counted or counted
as a real/used vlan respectively.
Also give better explanation in br_vlan_should_use's comment.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6be144f6

bridge: vlan: drop master_flags from __vlan_add · 2ffdf508

Nikolay Aleksandrov authored Oct 02, 2015

There's only one user now and we can include the flag directly.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ffdf508

bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts · f8ed289f

Nikolay Aleksandrov authored Oct 02, 2015

Introduce br_vlan_(get|put)_master which take a reference (or create the
master vlan first if it didn't exist) and drop a reference respectively.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8ed289f

bridge: vlan: use rcu list for the ordered vlan list · 586c2b57

Nikolay Aleksandrov authored Oct 02, 2015

When I did the conversion to rhashtable I missed the required locking of
one important user of the vlan list - br_get_link_af_size_filtered()
which is called:
br_ifinfo_notify() -> br_nlmsg_size() -> br_get_link_af_size_filtered()
and the notifications can be sent without holding rtnl. Before this
conversion the function relied on using rcu and since we already use rcu to
destroy the vlans, we can simply migrate the list to use the rcu helpers.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

586c2b57

03 Oct, 2015 4 commits

tcp/dccp: add SLAB_DESTROY_BY_RCU flag for request sockets · e96f78ab

Eric Dumazet authored Oct 03, 2015

Before letting request sockets being put in TCP/DCCP regular
ehash table, we need to add either :

- SLAB_DESTROY_BY_RCU flag to their kmem_cache
- add RCU grace period before freeing them.

Since we carefully respected the SLAB_DESTROY_BY_RCU protocol
like ESTABLISH and TIMEWAIT sockets, use it here.

req_prot_init() being only used by TCP and DCCP, I did not add
a new slab_flags into their rsk_prot, but reuse prot->slab_flags

Since all reqsk_alloc() users are correctly dealing with a failure,
add the __GFP_NOWARN flag to avoid traces under pressure.

Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e96f78ab

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 4236e2a1

David S. Miller authored Oct 03, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-09-30

This series contains updates to i40e and i40evf only.

Vasily Averin provides a couple of rtnl lock/unlock fixes for both i40e
and i40evf.

Shannon provides several updates and fixes, first fixes up a type clash
in i40e_aq_rc_to_posix(), where the error codes are signed values, so we
need to treat them as such.  Then fixes up a padding issue where an
extra byte is added in i40e_aqc_get_cee_dcb_cfg_v1_resp to directly
acknowledge the padding.  Updated i40e to keep debugfs register read
and writes from accessing outside of the io-remapped space.  Added
support and device id for another 20 GbE device.

Jesse fixes the transmit hand workaround code for ARM that was causing
Tx hangs to still occur occasionally when there really was no hang.  Then
fixed the receive dropped counter to show up in netstat interface.
Refactor the interrupt enable function since it was always making the
caller add the base_vector from the VSI struct which is already passed
to the function.  Fix kbuild warnings found in 0day build infrastructure
by adding a harmless cast to a dev_info(), also fix 32 bit build
warnings found by sparse.

Greg fixed a configuration error that results if a port VLAN is set
for a VF before the VF driver is loaded, so that when the VF driver is
loaded the port VLAN is ignored.

Mitch fixes the use of QOS field consistently in
i40e_ndo_set_vf_port_vlan().  Modified the init timing of the driver
to increase stability on load/unload and SR-IOV enable/disable cycles.

Anjali updates i40e to not collect VEB stats if they are disabled in the
hardware for performance reasons.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4236e2a1

Merge branch 'ravb-r8a7795' · 28117b08

David S. Miller authored Oct 03, 2015

Simon Horman says:

====================
ravb: Add support for r8a7795 SoC

please consider this series for net-next.
It enhances the ravb driver to support the r8a7795 SoC.

Changes:

* Dropped RFC prefix
* Details in changelog of individual patches

Base:

* net-next/master

Availability:

To aid review of this in conjunction with other EtherAVB changes
the following branches are available in my renesas tree on kernel.org.

* me/r8a7795-ravb-driver-v4: this series
* me/r8a7795-ravb-pfc-v2: r8a7795 sh-pfc update for EthernetAVB
* me/r8a7795-ravb-integration-v4: enable EthernetAVB on r8a7795
* me/r8a7795-ravb-driver-and-integration-v4.runtime:
      the above three branches with their runtime dependencies
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

28117b08

ravb: Add support for r8a7795 SoC · 22d4df8f

Kazuya Mizuguchi authored Sep 30, 2015

This patch supports the r8a7795 SoC by:
- Using two interrupts
  + One for E-MAC
  + One for everything else
  + Both can be handled by the existing common interrupt handler, which
    affords a simpler update to support the new SoC. In future some
    consideration may be given to implementing multiple interrupt handlers
- Limiting the phy speed to 100Mbit/s for the new SoC;
  at this time it is not clear how this restriction may be lifted
  but I hope it will be possible as more information comes to light
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
[horms: reworked]
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

22d4df8f