Commits · f53aaa31cce7b543e407da7e97690a700206f7b9 · Kirill Smelkov / linux

23 Jul, 2018 30 commits

net/mlx5: FW tracer, implement tracer logic · f53aaa31

Feras Daoud authored Jul 16, 2018

Implement FW tracer logic and registers access, initialization and
cleanup flows.

Initializing the tracer will be part of load one flow, as multiple
PFs will try to acquire ownership but only one will succeed and will
be the tracer owner.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

f53aaa31

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 7854ac44

Saeed Mahameed authored Jul 23, 2018

mlx5 core infrastructure updates and fixes.

From Eran:
 - Add MPEGC (Management PCIe General Configuration) registers and btis
 - Fix tristate and description for MLX5 module

rom Feras:
 - Add hardware structures for the firmware tracer

From Jainbo:
 - Core support for double vlan push/pop steering action

From Max:
 - Add XRQ commands definitions

From Noa:
 - Add missing SET_DRIVER_VERSION command translation

From Roi:
 - Use ERR_CAST() instead of coding it

From Tariq:
 - Better return types for CQE API
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

7854ac44

Merge branch 'lan743x-Add-features-to-lan743x-driver' · c9eaaa17

David S. Miller authored Jul 23, 2018

Bryan Whitehead says:

====================
lan743x: Add features to lan743x driver

This patch series adds extra features to the lan743x driver.

Updates for v4:
Patch 6/8 - Modified get/set_wol to use super set of
	    MAC and PHY driver support.
Patch 7/9 - In set_eee, return the return value from phy_ethtool_set_eee.

Updates for v3:
Removed patch 9 from this series, regarding PTP support
Patch 6/8 - Add call to phy_ethtool_get_wol to lan743x_ethtool_get_wol
Patch 7/8 - Add call to phy_ethtool_set_eee on (!eee->eee_enabled)

Updates for v2:
Patch 3/9 - Used ARRAY_SIZE macro in lan743x_ethtool_get_ethtool_stats.
Patch 5/9 - Used MAX_EEPROM_SIZE in lan743x_ethtool_set_eeprom.
Patch 6/9 - Removed unnecessary read of PMT_CTL.
	    Used CRC algorithm from lib.
	    Removed PHY interrupt settings from lan743x_pm_suspend
	    Change "#if CONFIG_PM" to "#ifdef CONFIG_PM"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c9eaaa17

lan743x: Add RSS support · 43e8fe9b

Bryan Whitehead authored Jul 23, 2018

Implement RSS support
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

43e8fe9b

lan743x: Add EEE support · c9cf96bb

Bryan Whitehead authored Jul 23, 2018

Implement EEE support
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9cf96bb

lan743x: Add power management support · 4d94282a

Bryan Whitehead authored Jul 23, 2018

Implement power management
Supports suspend, resume, and Wake on LAN
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d94282a

lan743x: Add support for ethtool eeprom access · 69584604

Bryan Whitehead authored Jul 23, 2018

Implement ethtool eeprom access
Also provides access to OTP (One Time Programming)
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

69584604

lan743x: Add support for ethtool message level · 2958337d

Bryan Whitehead authored Jul 23, 2018

Implement ethtool message level
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

2958337d

lan743x: Add support for ethtool statistics · 8114e8a2

Bryan Whitehead authored Jul 23, 2018

Implement ethtool statistics
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

8114e8a2

lan743x: Add support for ethtool link settings · 63b92a91

Bryan Whitehead authored Jul 23, 2018

Use default link setting functions
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

63b92a91

lan743x: Add support for ethtool get_drvinfo · 0cf63226

Bryan Whitehead authored Jul 23, 2018

Implement ethtool get_drvinfo
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cf63226

Merge branch 'sh_eth-clean-up-the-TSU-register-accessors' · 8760c4d6

David S. Miller authored Jul 23, 2018

Sergei Shtylyov says:

====================
sh_eth: clean up the TSU register accessors

Here's a set of 5 patches against DaveM's 'net-next.git' repo. They do
a final clean up of the TSU register accessors...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8760c4d6

sh_eth: make sh_eth_tsu_{read|write}_entry() prototypes symmetric · 51459d4c

Sergei Shtylyov authored Jul 23, 2018

sh_eth_tsu_read_entry() is still asymmetric with sh_eth_tsu_write_entry()
WRT their prototypes -- make them symmetric by passing to the former a TSU
register offset instead of its address and also adding the (now necessary)
'ndev' parameter...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51459d4c

sh_eth: make sh_eth_tsu_write_entry() take 'offset' parameter · 7a54c867

Sergei Shtylyov authored Jul 23, 2018

We can add the TSU register base address to a TSU register offset right
in sh_eth_tsu_write_entry(), no need to do it in its callers...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a54c867

sh_eth: call sh_eth_tsu_get_offset() from TSU register accessors · ecbecb0a

Sergei Shtylyov authored Jul 23, 2018

With sh_eth_tsu_get_offset() now actually returning TSU register's offset,
we can at last use it in sh_eth_tsu_{read|write}(). Somehow this saves 248
bytes of object code with AArch64 gcc 4.8.5... :-)
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecbecb0a

sh_eth: make sh_eth_tsu_get_offset() match its name · 41414f0a

Sergei Shtylyov authored Jul 23, 2018

sh_eth_tsu_get_offset(), despite its name, returns a TSU register's address,
not its offset. Make this function match its name and return a register's
offset from the TSU registers base address instead.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41414f0a

sh_eth: uninline sh_eth_tsu_get_offset() · 388c4bb4

Sergei Shtylyov authored Jul 23, 2018

sh_eth_tsu_get_offset() is called several  times  by the driver, remove
*inline* and move  that function  from the header to the driver  itself
to let gcc decide  whether to expand it inline or not...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

388c4bb4

wan/fsl_ucc_hdlc: use IS_ERR_VALUE() to check return value of qe_muram_alloc · fd800f64

YueHaibing authored Jul 23, 2018

qe_muram_alloc return a unsigned long integer,which should not
compared with zero. check it using IS_ERR_VALUE() to fix this.

Fixes: c19b6d24 ("drivers/net: support hdlc function for QE-UCC")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd800f64

Merge branch 'smc-next' · 6a525818

David S. Miller authored Jul 23, 2018

Ursula Braun says:

====================
net/smc: patches 2018-07-23

here are some small patches for SMC: Just the first patch contains a
functional change. It allows to differ between the modes SMCR and SMCD
on s390 when monitoring SMC sockets. The remaining patches are cleanups
without functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6a525818

net/smc: remove local variable page in smc_rx_splice() · 48bf5231

Ursula Braun authored Jul 23, 2018

The page map address is already stored in the RMB descriptor.
There is no need to derive it from the cpu_addr value.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48bf5231

net/smc: use DECLARE_BITMAP for rtokens_used_mask · 144ce4b9

Ursula Braun authored Jul 23, 2018

Link group field tokens_used_mask is a bitmap. Use macro
DECLARE_BITMAP for its definition.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

144ce4b9

net/smc: add function to get link group from link · 00e5fb26

Stefan Raspl authored Jul 23, 2018

Replace a frequently used construct with a more readable variant,
reducing the code. Also might come handy when we start to support
more than a single per link group.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00e5fb26

net/smc: eliminate cursor read and write calls · bac6de7b

Stefan Raspl authored Jul 23, 2018

The functions to read and write cursors are exclusively used to copy
cursors. Therefore switch to a respective function instead.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bac6de7b

net/smc: provide smc mode in smc_diag.c · c601171d

Karsten Graul authored Jul 23, 2018

Rename field diag_fallback into diag_mode and set the smc mode of a
connection explicitly.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c601171d

selftests: forwarding: gre_multipath: Drop IPv6 tests · 9a2ad362

Petr Machata authored Jul 23, 2018

Support for device-only IPv6 multipath next hops was dropped in
commit 33bd5ac5 ("net/ipv6: Revert attempt to simplify route replace
and append") and as of commit b5d2d75e ("net/ipv6: Do not allow
device only routes via the multipath API"), attempts to add a next hop
like that yield an explicit diagnostic.

Correspondingly, drop the IPv6 parts of GRE multipath test that are
supposed to test that code.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a2ad362

ipv6: sr: Use kmemdup instead of duplicating it in parse_nla_srh · 7fa41efa

YueHaibing authored Jul 23, 2018

Replace calls to kmalloc followed by a memcpy with a direct call to
kmemdup.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7fa41efa

Merge branch 'net-bridge-add-support-for-backup-port' · f8b2990f

David S. Miller authored Jul 23, 2018

Nikolay Aleksandrov says:

====================
net: bridge: add support for backup port

This set introduces a new bridge port option that allows any port to have
any other port (in the same bridge of course) as its backup and traffic
will be forwarded to the backup port when the primary goes down. This is
mainly used in MLAG and EVPN setups where we have peerlink path which is
a backup of many (or even all) ports and is a participating bridge port
itself. There's more detailed information in patch 02. Patch 01 just
prepares the port sysfs code for options that take raw value. The main
issues that this set solves are scalability and fallback latency.

We have used similar code for over 6 months now to bring the fallback
latency of the backup peerlink down and avoid fdb notification storms.
Also due to the nature of master devices such setup is currently not
possible, and last but not least having tens of thousands of fdbs require
thousands of calls to switch.

I've also CCed our MLAG experts that have been using similar option.

Roopa also adds:

"Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
      |
      -- switch-peerlink

switch-peerlink becomes the failover/backport port when say team0 to
the server goes down.
Today, when team0 goes down, control plane has to withdraw all the fdb
entries pointing to team0
and re-install the fdb entries pointing to switch-peerlink...and
restore the fdb entries when team0 comes back up again.
and  this is the problem we are trying to solve.

This also becomes necessary when multihoming is implemented by a
standard like E-VPN https://tools.ietf.org/html/rfc8365#section-8
where the 'switch-peerlink' is an overlay vxlan port (like nikolay
mentions in his patch commit). In these implementations, the fdb scale
can be much larger.

On why bond failover cannot be used here ?: the point that nikolay was
alluding to is, switch-peerlink in the above example is a bridge port
and is a failover/backport port for more than one or all ports in the
bridge br0. And you cannot enslave switch-peerlink into a second level
team
with other bridge ports. Hence a multi layered team device is not an
option (FWIW, switch-peerlink is also a teamed interface to the peer
switch)."

v3: Added Roopa's explanation and diagram
v2: In patch 01 use kstrdup/kfree to avoid casting the const buf. In order
to avoid using GFP_ATOMIC or always allocating I kept the spinlock inside
each branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f8b2990f

net: bridge: add support for backup port · 2756f68c

Nikolay Aleksandrov authored Jul 23, 2018

This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

Further explanation and a diagram by Roopa:
Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
|
-- switch-peerlink
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2756f68c

net: bridge: add support for raw sysfs port options · a5f3ea54

Nikolay Aleksandrov authored Jul 23, 2018

This patch adds a new alternative store callback for port sysfs options
which takes a raw value (buf) and can use it directly. It is needed for the
backup port sysfs support since we have to pass the device by its name.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5f3ea54

net: mediatek: use dma_zalloc_coherent instead of allocator/memset · 0a78c380

YueHaibing authored Jul 23, 2018

Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a78c380

22 Jul, 2018 10 commits

nfp: avoid buffer leak when FW communication fails · 07300f77

Jakub Kicinski authored Jul 20, 2018

After device is stopped we reset the rings by moving all free buffers
to positions [0, cnt - 2], and clear the position cnt - 1 in the ring.
We then proceed to clear the read/write pointers. This means that if
we try to reset the ring again the code will assume that the next to
fill buffer is at position 0 and swap it with cnt - 1. Since we
previously cleared position cnt - 1 it will lead to leaking the first
buffer and leaving ring in a bad state.

This scenario can only happen if FW communication fails, in which case
the ring will never be used again, so the fact it's in a bad state will
not be noticed. Buffer leak is the only problem. Don't try to move
buffers in the ring if the read/write pointers indicate the ring was
never used or have already been reset.

nfp_net_clear_config_and_disable() is now fully idempotent.

Found by code inspection, FW communication failures are very rare,
and reconfiguring a live device is not common either, so it's unlikely
anyone has ever noticed the leak.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07300f77

nfp: bring back support for offloading shared blocks · 042f8825

Jakub Kicinski authored Jul 20, 2018

Now that we have offload replay infrastructure added by
commit 32636742 ("net: sched: call reoffload op on block callback reg")
and flows are guaranteed to be removed correctly, we can revert
commit 951a8ee6 ("nfp: reject binding to shared blocks").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

042f8825

xen-netfront: fix queue name setting · 2d408c0d

Vitaly Kuznetsov authored Jul 20, 2018

Commit f599c64f ("xen-netfront: Fix race between device setup and
open") changed the initialization order: xennet_create_queues() now
happens before we do register_netdev() so using netdev->name in
xennet_init_queue() is incorrect, we end up with the following in
/proc/interrupts:

60: 139 0 xen-dyn -event eth%d-q0-tx
61: 265 0 xen-dyn -event eth%d-q0-rx
62: 234 0 xen-dyn -event eth%d-q1-tx
63: 1 0 xen-dyn -event eth%d-q1-rx

and this looks ugly. Actually, using early netdev name (even when it's
already set) is also not ideal: nowadays we tend to rename eth devices
and queue name may end up not corresponding to the netdev name.

Use nodename from xenbus device for queue naming: this can't change in VM's
lifetime. Now /proc/interrupts looks like

62: 202 0 xen-dyn -event device/vif/0-q0-tx
63: 317 0 xen-dyn -event device/vif/0-q0-rx
64: 262 0 xen-dyn -event device/vif/0-q1-tx
65: 17 0 xen-dyn -event device/vif/0-q1-rx

Fixes: f599c64f ("xen-netfront: Fix race between device setup and open")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d408c0d

net/dsa/realtek: add MODULE_LICENSE() · be5a8ffa

Randy Dunlap authored Jul 20, 2018

Add MODULE_LICENSE() to net/dsa/realtek.o to fix build warning message.

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/dsa/realtek.o
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be5a8ffa

bonding: don't cast const buf in sysfs store · 5b3df177

Nikolay Aleksandrov authored Jul 22, 2018

As was recently discussed [1], let's avoid casting the const buf in
bonding_sysfs_store_option and use kstrndup/kfree instead.

[1] http://lists.openwall.net/netdev/2018/07/22/25Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b3df177

Merge branch 'TX-used-ring-batched-updating-for-vhost' · fb42c838

David S. Miller authored Jul 22, 2018

Jason Wang says:

====================
TX used ring batched updating for vhost

This series implement batch updating of used ring for TX. This help to
reduce the cache contention on used ring. The idea is first split
datacopy path from zerocopy, and do only batching for datacopy. This
is because zercopy had already supported its own batching.

TX PPS was increased 25.8% and Netperf TCP does not show obvious
differences.

The split of datapath will also be helpful for future implementation
like in order completion.
====================
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb42c838

vhost_net: batch update used ring for datacopy TX · 4afb52c2

Jason Wang authored Jul 20, 2018

Like commit e2b3b35e ("vhost_net: batch used ring update in rx"),
this patches implements batch used ring update for datacopy TX
(zerocopy has already done some kind of batching).

Testpmd transmission from guest to host (XDP_DROP on tap) shows 25.8%
improvement (from ~3.1Mpps to ~3.9Mpps) on Broadwell i7-5600U CPU @
2.60GHz machine. Netperf TCP tests does not show obvious differences.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4afb52c2

vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH · d0d86971

Jason Wang authored Jul 20, 2018

A more generic name which could be used for TX as well.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0d86971

vhost_net: rename vhost_rx_signal_used() to vhost_net_signal_used() · 09c32489

Jason Wang authored Jul 20, 2018

Rename for reusing this for TX.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

09c32489

vhost_net: split out datacopy logic · 0d20bdf3

Jason Wang authored Jul 20, 2018

Instead of mixing zerocopy and datacopy logics, this patch tries to
split datacopy logic out. This results for a more compact code and
ad-hoc optimization could be done on top more easily.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d20bdf3