Commits · ebe023a4248834f774d537898898ce7bcbec0958 · Kirill Smelkov / linux

29 Jul, 2018 27 commits

Merge tag 'mlx5e-updates-2018-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ebe023a4

David S. Miller authored Jul 29, 2018

Saeed Mahameed says:

====================
mlx5e-updates-2018-07-27 (Vxlan updates)

This series from Gal and Saeed provides updates to mlx5 vxlan implementation.

Gal, started with three cleanups to reflect the actual hardware vxlan state
- reflect 4789 UDP port default addition to software database
- check maximum number of vxlan  UDP ports
- cleanup an unused member in vxlan work

Then Gal provides performance optimization by replacing the
vxlan radix tree with a hash table.

Measuring mlx5e_vxlan_lookup_port execution time:

                      Radix Tree   Hash Table
     --------------- ------------ ------------
      Single Stream   161 ns       79  ns (51% improvement)
      Multi Stream    259 ns       136 ns (47% improvement)

    Measuring UDP stream packet rate, single fully utilized TX core:
    Radix Tree: 498,300 PPS
    Hash Table: 555,468 PPS (11% improvement)

Next, from Saeed, vxlan refactoring to allow sharing the vxlan table
between different mlx5 netdevice instances like PF and VF representors,
this is done by making mlx5 vxlan interface more generic and decoupling
it from PF netdevice structures and logic, then moving it into mlx5 core
as a low level interface so it can be used by VF representors, which is
illustrated in the last patch of the serious.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ebe023a4

selftests: mlxsw: qos_dscp_bridge: Fix · eef6ab8b

Petr Machata authored Jul 28, 2018

There are two problems in this test case:

- When indexing in bash associative array, the subscript is interpreted as
  string, not as a variable name to be expanded.

- The keys stored to t0s and t1s are not DSCP values, but priority +
  base (i.e. the logical DSCP value, not the full bitfield value).

In combination these two bugs conspire to make the test just work,
except it doesn't really test anything and always passes.

Fix the above two problems in obvious manner.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eef6ab8b

Merge branch 'mtu-related-changes' · 720516b1

David S. Miller authored Jul 29, 2018

Stephen Hemminger says:

====================
mtu related changes

While looking at other MTU issues, noticed a couple oppurtunties
for improving user experience.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

720516b1

net: report invalid mtu value via netlink extack · 7a4c53be

Stephen Hemminger authored Jul 27, 2018

If an invalid MTU value is set through rtnetlink return extra error
information instead of putting message in kernel log. For other cases
where there is no visible API, keep the error report in the log.

Example:
	# ip li set dev enp12s0 mtu 10000
	Error: mtu greater than device maximum.

	# ifconfig enp12s0 mtu 10000
	SIOCSIFMTU: Invalid argument
	# dmesg | tail -1
	[ 2047.795467] enp12s0: mtu greater than device maximum
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a4c53be

net: report min and max mtu network device settings · 3e7a50ce

Stephen Hemminger authored Jul 27, 2018

Report the minimum and maximum MTU allowed on a device
via netlink so that it can be displayed by tools like
ip link.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e7a50ce

failover: change mtu has RTNL · 3260155a

Stephen Hemminger authored Jul 27, 2018

When changing MTU, RTNL is held so use rtnl_dereference
instead of rcu_dereference.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3260155a

net: dcb: add DSCP to comment about priority selector types · 4b09384a

Jakub Kicinski authored Jul 27, 2018

Commit ee205981 ("net/dcb: Add dscp to priority selector type")
added a define for the new DSCP selector type created by
IEEE 802.1Qcd, but missed the comment enumerating all selector types.
Update the comment.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b09384a

net: ethernet: ti: cpsw: add missed RX_CTAG feature for second slave · 193736c8

Ivan Khoronzhuk authored Jul 27, 2018

Seems it was missed while adding for first net dev in dual-emac mode.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

193736c8

Merge branch 'route-add-support-and-selftests-for-directed-broadcast-forwarding' · d05c1ce5

David S. Miller authored Jul 29, 2018

Xin Long says:

====================
route: add support and selftests for directed broadcast forwarding

Patch 1/2 is the feature and 2/2 is the selftest. Check the changelog
on each of them to know the details.

v1->v2:
  - fix a typo in changelog.
  - fix an uapi break that Davide noticed.
  - flush route cache when bc_forwarding is changed.
  - add the selftest for this patch as Ido's suggestion.

v2->v3:
  - fix an incorrect 'if check' in devinet_conf_proc as David Ahern
    noticed.
  - extend the selftest after one David Ahern fix for vrf.

v3->v4:
  - improve the output log in the selftest as David Ahern suggested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d05c1ce5

selftests: add a selftest for directed broadcast forwarding · 40f98b9a

Xin Long authored Jul 27, 2018

As Ido's suggestion, this patch is to add a selftest for directed
broadcast forwarding with vrf. It does the assertion by checking
the src IP of the echo-reply packet in ping_test_from.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40f98b9a

route: add support for directed broadcast forwarding · 5cbf777c

Xin Long authored Jul 27, 2018

This patch implements the feature described in rfc1812#section-5.3.5.2
and rfc2644. It allows the router to forward directed broadcast when
sysctl bc_forwarding is enabled.

Note that this feature could be done by iptables -j TEE, but it would
cause some problems:
  - target TEE's gateway param has to be set with a specific address,
    and it's not flexible especially when the route wants forward all
    directed broadcasts.
  - this duplicates the directed broadcasts so this may cause side
    effects to applications.

Besides, to keep consistent with other os router like BSD, it's also
necessary to implement it in the route rx path.

Note that route cache needs to be flushed when bc_forwarding is
changed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cbf777c

net/ipv6: allow any source address for sendmsg pktinfo with ip_nonlocal_bind · d0c1f011

Vincent Bernat authored Jul 25, 2018

When freebind feature is set of an IPv6 socket, any source address can
be used when sending UDP datagrams using IPv6 PKTINFO ancillary
message. Global non-local bind feature was added in commit
35a256fe ("ipv6: Nonlocal bind") for IPv6. This commit also allows
IPv6 source address spoofing when non-local bind feature is enabled.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0c1f011

Merge tag 'linux-can-next-for-4.19-20180727' of... · 41627cdb

David S. Miller authored Jul 29, 2018

Merge tag 'linux-can-next-for-4.19-20180727' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2018-01-16

this is a pull request for net-next/master consisting of 38 patches.

Dan Murphy's patch fixes the path to a file in the comment of the CAN
Error Message Frame Mask structure.

A patch by Colin Ian King fixes a typo in the cc770 driver.

The next patch is by me an sorts the Kconfigand Makefile entries of the
CAN-USB driver subdir alphabetically.

The patch by Jakob Unterwurzacher adds support for the UCAN USB-CAN
adapter.

YueHaibing's patch replaces a open coded skb_put()+memset() by
skb_put_zero() in the CAN-dev infrastructure.

Zhu Yi provides a patch to enable multi-queue CAN devices.

Three patches by Luc Van Oostenryck fix the return value of several
driver's xmit function, I contribute a patch for the a fourth driver.

Fabio Estevam's patch switches the flexcan driver to SPDX identifier.

Two patches by Jia-Ju Bai replace the mdelay() by a usleep_range() in
the sja1000 drivers.

The next 6 patches are by Anssi Hannula and refactor the xilinx CAN
driver and add support for the xilinx CAN FD core.

A patch by Gustavo A. R. Silva adds fallthrough annotation to the
peak_usb driver.

5 patches by Stephane Grosjean for the peak CANFD driver do some
cleanups and provide more improvements for further firmware releases.

The remaining 13 patches are by Jimmy Assarsson and the first clean up
the kvaser_usb driver, so that the later patches add support for the
Kvaser USB hydra family.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

41627cdb

qed: remove redundant functions qed_set_gft_event_id_cm_hdr · 4be90c79

YueHaibing authored Jul 27, 2018

There are no in-tree callers of qed_set_gft_event_id_cm_hdr.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4be90c79

liquidio: remove redundant function cn23xx_dump_vf_iq_regs · 5ae42de5

YueHaibing authored Jul 27, 2018

There are no in-tree callers.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ae42de5

Merge branch 'tls-Fix-improper-revert-in-zerocopy_from_iter' · d339377c

David S. Miller authored Jul 28, 2018

Doron Roberts-Kedes says:

====================
tls: Fix improper revert in zerocopy_from_iter

This series fixes the improper iov_iter_revert introcded in
"tls: Fix zerocopy_from_iter iov handling".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d339377c

tls: Fix improper revert in zerocopy_from_iter · 2da19ed3

Doron Roberts-Kedes authored Jul 26, 2018

The current code is problematic because the iov_iter is reverted and
never advanced in the non-error case. This patch skips the revert in the
non-error case. This patch also fixes the amount by which the iov_iter
is reverted. Currently, iov_iter is reverted by size, which can be
greater than the amount by which the iter was actually advanced.
Instead, only revert by the amount that the iter was advanced.

Fixes: 47187998 ("tls: Fix zerocopy_from_iter iov handling")
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2da19ed3

tls: Remove dead code in tls_sw_sendmsg · 5a3611ef

Doron Roberts-Kedes authored Jul 26, 2018

tls_push_record either returns 0 on success or a negative value on failure.
This patch removes code that would only be executed if tls_push_record
were to return a positive value.
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a3611ef

Merge branch 'mvneta-next' · 37b81dc5

David S. Miller authored Jul 28, 2018

Gregory CLEMENT says:

====================
A fix and a few improvements on mvneta

This series brings some improvements for the mvneta driver and also
adds a fix.

Compared to the v2, the main change is another patch fixing a bug
in mtu_change.

Changelog:
v1 -> v2

 - In patch 2, use EXPORT_SYMBOL_GPL for mvneta_bm_get and
   mvneta_bm_put to be used in module, reported by kbuild test robot.

 - In patch 4, add the counter to the driver's ethtool state,
   suggested by David Miller.

 - In patch 6, use a single if, suggested by Marcin Wojtas

v2 -> v3

 - Adding a patch fixing the mtu change issue

 - Removing the inline keyword for mvneta_rx_refill() and let the
   comiler decided, suggested by David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

37b81dc5

net: mvneta: Improve the buffer allocation method for SWBM · 562e2f46

Yelena Krivosheev authored Jul 18, 2018

With system having a small memory (around 256MB), the state "cannot
allocate memory to refill with new buffer" is reach pretty quickly.

By this patch we changed buffer allocation method to a better handling of
this use case by avoiding memory allocation issues.
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory: extract from a larger patch]
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

562e2f46

net: mvneta: Verify hardware checksum only when offload checksum feature is set · f945cec8

Yelena Krivosheev authored Jul 18, 2018

If the checksum offload feature is not set, then there is no point to
check the status of the hardware.

[gregory: extract from a larger patch]
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f945cec8

net: mvneta: Allocate page for the descriptor · 7e47fd84

Gregory CLEMENT authored Jul 18, 2018

Instead of trying to allocate the exact amount of memory for each
descriptor use a page for each of them, it allows to simplify the
allocation management and increase the performance of the driver.

Based on the work of Yelena Krivosheev <yelena@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e47fd84

net: mvneta: discriminate error cause for missed packet · 17a96da6

Gregory CLEMENT authored Jul 18, 2018

In order to improve the diagnostic in case of error, make the distinction
between refill error and skb allocation error. Also make the information
available through the ethtool state.

Based on the work of Yelena Krivosheev <yelena@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17a96da6

net: mvneta: increase number of buffers in RX and TX queue · c307e2a8

Yelena Krivosheev authored Jul 18, 2018

The initial values were too small leading to poor performance when using
the software buffer management.
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory: extract from a larger patch]
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c307e2a8

net: mvneta: remove data pointer usage from device_node structure · 965cbbec

Gregory CLEMENT authored Jul 18, 2018

On year ago Rob Herring wanted to remove the data pointer from the
device_node structure[1]. The mvneta driver seemed to be the only one
which used (abused ?) it. However, the proposal of Rob to remove this
pointer from the driver introduced a regression, and I tested and fixed an
alternative way, but it was never submitted as a proper patch.

Now here it is: Instead of using the device_node structure ->data
pointer, we store the BM private data as the driver data of the BM
platform_device. The core mvneta code can retrieve it by doing a lookup
on which platform_device corresponds to the BM device tree node using
of_find_device_by_node(), and get its driver data

[1]https://www.spinics.net/lists/netdev/msg445197.htmlSigned-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

965cbbec

net: mvneta: fix mtu change on port without link · 8466baf7

Yelena Krivosheev authored Jul 18, 2018

It is incorrect to enable TX/RX queues (call by mvneta_port_up()) for
port without link. Indeed MTU change for interface without link causes TX
queues to stuck.

Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP
network unit")
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory.clement: adding Fixes tags and rewording commit log]
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8466baf7

net: ethernet: mvneta: Fix napi structure mixup on armada 3700 · 7a86f05f

Andrew Lunn authored Jul 18, 2018

The mvneta Ethernet driver is used on a few different Marvell SoCs.
Some SoCs have per cpu interrupts for Ethernet events. Some SoCs have
a single interrupt, independent of the CPU. The driver handles this by
having a per CPU napi structure when there are per CPU interrupts, and
a global napi structure when there is a single interrupt.

When the napi core calls mvneta_poll(), it passes the napi
instance. This was not being propagated through the call chain, and
instead the per-cpu napi instance was passed to napi_gro_receive()
call. This breaks when there is a single global napi instance.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 2636ac3c ("net: mvneta: Add network support for Armada 3700 SoC")
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a86f05f

27 Jul, 2018 13 commits

net/mlx5e: Issue direct lookup on vxlan ports by vport representors · a3e67366

Saeed Mahameed authored May 19, 2018

Remove uplink representor netdevice private structure lookup, and use
mlx5 core handle directly from representor private structure to lookup
vxlan ports.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

a3e67366

net/mlx5e: Vxlan, move vxlan logic to core driver · 358aa5ce

Saeed Mahameed authored May 09, 2018

Move vxlan logic and objects to mlx5 core dirver.
Since it going to be used from different mlx5 interfaces.
e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

358aa5ce

net/mlx5e: Vxlan, add sync lock for add/del vxlan port · aec4eab9

Saeed Mahameed authored May 08, 2018

Vxlan API can and will be called from different mlx5 modules, we should
not count on mlx5e private state lock only, hence we introduce a vxlan
private mutex to sync between add/del vxlan port operations.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

aec4eab9

net/mlx5e: Vxlan, return values for add/del port · 1b318a92

Saeed Mahameed authored May 08, 2018

For a better API mlx5_vxlan_{add/del}_port can fail, make them return
error values.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

1b318a92

net/mlx5e: Vxlan, rename from mlx5e to mlx5 · a3c785d7

Saeed Mahameed authored May 08, 2018

Rename vxlan functions from mlx5e_vxlan_* to mlx5_vxlan_*.
Rename mlx5e_vxlan_db to mlx5_vxlan and move it from en.h to vxlan.c
since it is not related to mlx5e anymore.

Allocate mlx5_vxlan structure dynamically in order to make it easier to
move later to core driver and to make it private in vxlan.c.

This is in preparation to move vxlan API to mlx5 core.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

a3c785d7

net/mlx5e: Vxlan, rename struct mlx5e_vxlan to mlx5_vxlan_port · 5006eb22

Saeed Mahameed authored May 08, 2018

The name mlx5e_vxlan will be used in downstream patch to describe
mlx5 vxlan structure that will replace mlx5e_vxlan_db.

Hence we rename struct mlx5e_vxlan to mlx5_vxlan_port which describes a
mlx5 vxlan port.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

5006eb22

net/mlx5e: Vxlan, move netdev only logic to en_main.c · dccea6bf

Saeed Mahameed authored May 08, 2018

Create a direct vxlan API to add and delete vxlan ports from HW.
+void mlx5e_vxlan_add_port(struct mlx5e_priv *priv, u16 port);
+void mlx5e_vxlan_del_port(struct mlx5e_priv *priv, u16 port);

And move vxlan_add/del_work to en_main.c since they are netdev only
logic.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

dccea6bf

net/mlx5e: Vxlan, add direct delete function · 0f647bfc

Saeed Mahameed authored May 07, 2018

Add direct vxlan delete function to be called from vxlan_delete_work.
Needed in downstream patch.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>

0f647bfc

net/mlx5e: Vxlan, cleanup an unused member in vxlan work · 278d7f3d

Gal Pressman authored Feb 13, 2018

Cleanup the sa_family member of the vxlan work, it is unused/needed
anywhere in the code.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

278d7f3d

net/mlx5e: Vxlan, replace ports radix-tree with hash table · d30d8cde

Gal Pressman authored Dec 26, 2017

The VXLAN database is accessed in the data path for each VXLAN TX skb in
order to check whether the UDP port is being offloaded or not.
The number of elements in the database is relatively small, we can
simplify the radix-tree to a hash table and speedup the lookup process.

Measuring mlx5e_vxlan_lookup_port execution time:

                  Radix Tree   Hash Table
 --------------- ------------ ------------
  Single Stream   161 ns       79  ns (51% improvement)
  Multi Stream    259 ns       136 ns (47% improvement)

Measuring UDP stream packet rate, single fully utilized TX core:
Radix Tree: 498,300 PPS
Hash Table: 555,468 PPS (11% improvement)
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d30d8cde

net/mlx5e: Vxlan, check maximum number of UDP ports · 22a65aa8

Gal Pressman authored Dec 25, 2017

The NIC has a limited number of offloaded VXLAN UDP ports (usually 4).
Instead of letting the firmware fail when trying to add more ports than
it can handle, let the driver check it on its own.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

22a65aa8

net/mlx5e: Vxlan, reflect 4789 UDP port default addition to software database · a082c4f4

Gal Pressman authored Jan 17, 2018

The hardware offloads 4789 UDP port (default VXLAN port) automatically.
Add it to the software database as well in order to reflect the hardware
state appropriately.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

a082c4f4

net: tipc: bcast: Replace GFP_ATOMIC with GFP_KERNEL in tipc_bcast_init() · a0732548

Jia-Ju Bai authored Jul 27, 2018

tipc_bcast_init() is never called in atomic context.
It calls kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0732548