Commits · 713865da3c623034a46b382b5d544048edcce827 · Kirill Smelkov / linux

10 Sep, 2020 20 commits

net: ena: ethtool: Add new device statistics · 713865da

Sameeh Jubran authored Sep 10, 2020

The new metrics provide granular visibility along multiple network
dimensions and enable troubleshooting and remediation of issues caused
by instances exceeding network performance allowances.

The new statistics can be queried using ethtool command.
Signed-off-by: Guy Tzalik <gtzalik@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

713865da

net: ena: ethtool: convert stat_offset to 64 bit resolution · f1852d64

Sameeh Jubran authored Sep 10, 2020

The type of all stat fields is u64, therefore when iterating over stat
fields in a stats struct, it makes sense to use an offset in 64 bit
resolution. Doing so allows us to drop some of the casting that is
currently used when referencing stats.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1852d64

selftests/mptcp: Better delay & reordering configuration · e5484658

Christoph Paasch authored Sep 09, 2020

The delay was intended to be configured to "simulate" a high(er) BDP
link. As such, it needs to be set as part of the loss-configuration and
not as part of the netem reordering configuration.

The reordering-config also requires a delay but that delay is the
reordering-extend. So, a good approach is to set the reordering-extend
as a function of the configured latency. E.g., 25% of the overall
latency.

To speed up the selftests, we limit the delay to 50ms maximum to avoid
having the selftests run for too long.

Finally, the intention of tc_reorder was that when it is unset, the test
picks a random configuration. However, currently it is always initialized
and thus the random config won't be picked up.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/6Reported-and-reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5484658

Merge branch 'tcp-add-tos-reflection-feature' · d095c462

David S. Miller authored Sep 10, 2020

Wei Wang says:

====================
tcp: add tos reflection feature

This patch series adds a new tcp feature to reflect TOS value received in
SYN, and send it out in SYN-ACK, and eventually set the TOS value of the
established socket with this reflected TOS value. This provides a way to
set the traffic class/QoS level for all traffic in the same connection
to be the same as the incoming SYN. It could be useful for datacenters
to provide equivalent QoS according to the incoming request.
This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by
default turned off.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d095c462

tcp: reflect tos value received in SYN to the socket · ac8f1710

Wei Wang authored Sep 09, 2020

This commit adds a new TCP feature to reflect the tos value received in
SYN, and send it out on the SYN-ACK, and eventually set the tos value of
the established socket with this reflected tos value. This provides a
way to set the traffic class/QoS level for all traffic in the same
connection to be the same as the incoming SYN request. It could be
useful in data centers to provide equivalent QoS according to the
incoming request.
This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by
default turned off.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac8f1710

ip: pass tos into ip_build_and_send_pkt() · de033b7d

Wei Wang authored Sep 09, 2020

This commit adds tos as a new passed in parameter to
ip_build_and_send_pkt() which will be used in the later commit.
This is a pure restructure and does not have any functional change.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de033b7d

tcp: record received TOS value in the request socket · e9b12edc

Wei Wang authored Sep 09, 2020

A new field is added to the request sock to record the TOS value
received on the listening socket during 3WHS:
When not under syn flood, it is recording the TOS value sent in SYN.
When under syn flood, it is recording the TOS value sent in the ACK.
This is a preparation patch in order to do TOS reflection in the later
commit.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9b12edc

net: mventa: drop mvneta_stats from mvneta_swbm_rx_frame signature · 3a8c4ad1

Lorenzo Bianconi authored Sep 09, 2020

Remove mvneta_stats from mvneta_swbm_rx_frame signature since now stats
are accounted in mvneta_run_xdp routine
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a8c4ad1

Merge branch 'netpoll-make-sure-napi_list-is-safe-for-RCU-traversal' · 6198f446

David S. Miller authored Sep 10, 2020

Jakub Kicinski says:

====================
netpoll: make sure napi_list is safe for RCU traversal

This series is a follow-up to the fix in commit 96e97bc0 ("net:
disable netpoll on fresh napis"). To avoid any latent race conditions
convert dev->napi_list to a proper RCU list. We need minor restructuring
because it looks like netif_napi_del() used to be idempotent, and
it may be quite hard to track down everyone who depends on that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6198f446

net: make sure napi_list is safe for RCU traversal · 5251ef82

Jakub Kicinski authored Sep 09, 2020

netpoll needs to traverse dev->napi_list under RCU, make
sure it uses the right iterator and that removal from this
list is handled safely.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5251ef82

net: manage napi add/del idempotence explicitly · 4d092dd2

Jakub Kicinski authored Sep 09, 2020

To RCUify napi->dev_list we need to replace list_del_init()
with list_del_rcu(). There is no _init() version for RCU for
obvious reasons. Up until now netif_napi_del() was idempotent
so to make sure it remains such add a bit which is set when
NAPI is listed, and cleared when it removed. Since we don't
expect multiple calls to netif_napi_add() to be correct,
add a warning on that side.

Now that napi_hash_add / napi_hash_del are only called by
napi_add / del we can actually steal its bit. We just need
to make sure hash node is initialized correctly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d092dd2

net: remove napi_hash_del() from driver-facing API · 5198d545

Jakub Kicinski authored Sep 09, 2020

We allow drivers to call napi_hash_del() before calling
netif_napi_del() to batch RCU grace periods. This makes
the API asymmetric and leaks internal implementation details.
Soon we will want the grace period to protect more than just
the NAPI hash table.

Restructure the API and have drivers call a new function -
__netif_napi_del() if they want to take care of RCU waits.

Note that only core was checking the return status from
napi_hash_del() so the new helper does not report if the
NAPI was actually deleted.

Some notes on driver oddness:
 - veth observed the grace period before calling netif_napi_del()
   but that should not matter
 - myri10ge observed normal RCU flavor
 - bnx2x and enic did not actually observe the grace period
   (unless they did so implicitly)
 - virtio_net and enic only unhashed Rx NAPIs

The last two points seem to indicate that the calls to
napi_hash_del() were a left over rather than an optimization.
Regardless, it's easy enough to correct them.

This patch may introduce extra synchronize_net() calls for
interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
free_netdev() to call netif_napi_del(). This seems inevitable
since we want to use RCU for netpoll dev->napi_list traversal,
and almost no drivers set IFF_DISABLE_NETPOLL.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5198d545

Merge branch 'mlx4-avoid-devlink-port-type-not-set-warnings' · 8b40f21b

David S. Miller authored Sep 10, 2020

Jakub Kicinski says:

====================
mlx4: avoid devlink port type not set warnings

This small set addresses the issue of mlx4 potentially not setting
devlink port type when Ethernet or IB driver is not built, but
port has that type.

v2:
 - add patch 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8b40f21b

mlx4: make sure to always set the port type · 0313c7c2

Jakub Kicinski authored Sep 08, 2020

Even tho mlx4_core registers the devlink ports, it's mlx4_en
and mlx4_ib which set their type. In situations where one of
the two is not built yet the machine has ports of given type
we see the devlink warning from devlink_port_type_warn() trigger.

Having ports of a type not supported by the kernel may seem
surprising, but it does occur in practice - when the unsupported
port is not plugged in to a switch anyway users are more than happy
not to see it (and potentially allocate any resources to it).

Set the type in mlx4_core if type-specific driver is not built.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0313c7c2

devlink: don't crash if netdev is NULL · 3ea87ca7

Jakub Kicinski authored Sep 08, 2020

Following change will add support for a corner case where
we may not have a netdev to pass to devlink_port_type_eth_set()
but we still want to set port type.

This is definitely a corner case, and drivers should not normally
pass NULL netdev - print a warning message when this happens.

Sadly for other port types (ib) switches don't have a device
reference, the way we always do for Ethernet, so we can't put
the warning in __devlink_port_type_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ea87ca7

net: mvneta: rely on MVNETA_MAX_RX_BUF_SIZE for pkt split in mvneta_swbm_rx_frame() · 6eb8b7fb

Lorenzo Bianconi authored Sep 08, 2020

In order to easily change the rx buffer size, rely on
MVNETA_MAX_RX_BUF_SIZE instead of PAGE_SIZE in mvneta_swbm_rx_frame
routine for rx buffer split. Currently this is not an issue since we set
MVNETA_MAX_RX_BUF_SIZE to PAGE_SIZE - MVNETA_SKB_PAD but it is a good to
have to configure a different rx buffer size.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6eb8b7fb

Merge branch 'Allow-more-than-255-IPv4-multicast-interfaces' · 8c5c49a6

David S. Miller authored Sep 10, 2020

Paul Davey says:

====================
Allow more than 255 IPv4 multicast interfaces

Currently it is not possible to use more than 255 multicast interfaces
for IPv4 due to the format of the igmpmsg header which only has 8 bits
available for the VIF ID.  There is space available in the igmpmsg
header to store the full VIF ID in the form of an unused byte following
the VIF ID field.  There is also enough space for the full VIF ID in
the Netlink cache notifications, however the value is currently taken
directly from the igmpmsg header and has thus already been truncated.

Adding the high byte of the VIF ID into the unused3 byte of igmpmsg
allows use of more than 255 IPv4 multicast interfaces. The full VIF ID
is  also available in the Netlink notification by assembling it from
both bytes from the igmpmsg.

Additionally this reveals a deficiency in the Netlink cache report
notifications, they lack any means for differentiating cache reports
relating to different multicast routing tables.  This is easily
resolved by adding the multicast route table ID to the cache reports.

changes in v2:
 - Added high byte of VIF ID to igmpmsg struct replacing unused3
   member.
 - Assemble VIF ID in Netlink notification from both bytes in igmpmsg
   header.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8c5c49a6

ipmr: Use full VIF ID in netlink cache reports · bb82067c

Paul Davey authored Sep 08, 2020

Insert the full 16 bit VIF ID into ipmr Netlink cache reports.

The VIF_ID attribute has 32 bits of space so can store the full VIF ID
extracted from the high and low byte fields in the igmpmsg.
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb82067c

ipmr: Add high byte of VIF ID to igmpmsg · c8715a8e

Paul Davey authored Sep 08, 2020

Use the unused3 byte in struct igmpmsg to hold the high 8 bits of the
VIF ID.

If using more than 255 IPv4 multicast interfaces it is necessary to have
access to a VIF ID for cache reports that is wider than 8 bits, the VIF
ID present in the igmpmsg reports sent to mroute_sk was only 8 bits wide
in the igmpmsg header.  Adding the high 8 bits of the 16 bit VIF ID in
the unused byte allows use of more than 255 IPv4 multicast interfaces.
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8715a8e

ipmr: Add route table ID to netlink cache reports · 501cb008

Paul Davey authored Sep 08, 2020

Insert the multicast route table ID as a Netlink attribute to Netlink
cache report notifications.

When multiple route tables are in use it is necessary to have a way to
determine which route table a given cache report belongs to when
receiving the cache report.
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

501cb008

09 Sep, 2020 20 commits

net: dsa: b53: Report VLAN table occupancy via devlink · 4f6a5caf

Florian Fainelli authored Sep 09, 2020

We already maintain an array of VLANs used by the switch so we can
simply iterate over it to report the occupancy via devlink.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f6a5caf

Merge branch 'Marvell-PP2-2-PTP-support' · 4a056990

David S. Miller authored Sep 09, 2020

Russell King says:

====================
Marvell PP2.2 PTP support

This series adds PTP support for PP2.2 hardware to the mvpp2 driver.
Tested on the Macchiatobin eth1 port.

Note that on the Macchiatobin, eth0 uses a separate TAI block from
eth1, and there is no hardware synchronisation between the two.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4a056990

net: mvpp2: ptp: add support for transmit timestamping · f5015a59

Russell King authored Sep 09, 2020

Add support for timestamping transmit packets.  We allocate SYNC
messages to queue 1, every other message to queue 0.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5015a59

net: mvpp2: ptp: add support for receive timestamping · ce3497e2

Russell King authored Sep 09, 2020

Add support for receive timestamping. When enabled, the hardware adds
a timestamp into the receive queue descriptor for all received packets
with no filtering. Hence, we can only support NONE or ALL receive
filter modes.

The timestamp in the receive queue contains two bit sof seconds and
the full nanosecond timestamp. This has to be merged with the remainder
of the seconds from the TAI clock to arrive at a full timestamp before
we can convert it to a ktime for the skb hardware timestamp field.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce3497e2

net: mvpp2: ptp: add TAI support · 91dd7195

Russell King authored Sep 09, 2020

Add support for the TAI block in the mvpp2.2 hardware.
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

91dd7195

net: mvpp2: check first level interrupt status registers · b4b17714

Russell King authored Sep 09, 2020

Check the first level interrupt status registers to determine how to
further process the port interrupt. We will need this to know whether
to invoke the link status processing and/or the PTP processing for
both XLG and GMAC.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4b17714

net: mvpp2: rename mis-named "link status" interrupt · 89141972

Russell King authored Sep 09, 2020

The link interrupt is used for way more than just the link status; it
comes from a collection of units to do with the port. The Marvell
documentation describes the interrupt as "GOP port X interrupt".

Since we are adding PTP support, and the PTP interrupt uses this,
rename it to be more inline with the documentation.

This interrupt is also mis-named in the DT binding, but we leave that
alone.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

89141972

net: mvpp2: restructure "link status" interrupt handling · 36cfd3a6

Russell King authored Sep 09, 2020

The "link status" interrupt is used for more than just link status.
Restructure mvpp2_link_status_isr() so we can add additional handling.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

36cfd3a6

Merge branch 'devlink-show-controller-number' · b599a5b9

David S. Miller authored Sep 09, 2020

Parav Pandit says:

====================
devlink show controller number

Currently a devlink instance that supports an eswitch handles eswitch
ports of two type of controllers.
(1) controller discovered on same system where eswitch resides.
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external system.
This is the case where a controller is plugged in one system and its
devlink eswitch ports are located in a different system. In this case
devlink instance of the eswitch only have access to ports of the
controller.
However, there is no way to describe that a eswitch devlink port
belongs to which controller (mainly which external host controller).
This problem is more prevalent when port attribute such as PF and VF
numbers are overlapping between multiple controllers of same eswitch.
Due to this, for a specific switch_id, unique phys_port_name cannot
be constructed for such devlink ports.

This short series overcomes this limitation by defining two new
attributes.
(a) external: Indicates if port belongs to external controller
(b) controller number: Indicates a controller number of the port

Based on this a unique phys_port_name is prepared using controller
number.

phys_port_name construction using unique controller number is only
applicable to external controller ports. This ensures that for
non smartnic usecases where there is no external controller,
phys_port_name stays same as before.

Patch summary:
Patch-1 Added mlx5 driver to read controller number
Patch-2 Adds the missing comment for the port attributes
Patch-3 Move structure comments away from structure fields
Patch-4 external attribute added for PCI port flavours
Patch-5 Add controller number
Patch-6 Use controller number to build phys_port_name

---
Changelog:
v2->v3:
 - Updated diagram to get rid of controller 'A' and 'B'
 - Kept ports of single controller together in diagram
 - Updated diagram for pf1's VF and SF and its ports
v1->v2:
 - Added text diagram of multiple controllers
 - Updated example for a VF
 - Addressed comments from Jiri and Jakub
 - Moved controller number attribute to PCI port flavours
   This enables to better, hirerchical view with controller and its
    PF, VF numbers
 - Split 'external' and 'controller number' attributes as two
   different attributes
 - Merged mlx5_core driver to avoid compiliation break
====================
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b599a5b9

devlink: Use controller while building phys_port_name · 66b17082

Parav Pandit authored Sep 09, 2020

Now that controller number attribute is available, use it when
building phsy_port_name for external controller ports.

An example devlink port and representor netdev name consist of controller
annotation for external controller with controller number = 1,
for a VF 1 of PF 0:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Controller number annotation is skipped for non external controllers to
maintain backward compatibility.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

66b17082

devlink: Introduce controller number · 3a2d9588

Parav Pandit authored Sep 09, 2020

A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external host system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system.

When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.

For example in below such system controller-0 and controller-1, each has
PCI PF pf0 whose eswitch ports can be present in controller-0.
These results in phys_port_name as "pf0" for both.
Similar problem exists for VFs and upcoming Sub functions.

An example view of two controller systems:

             ---------------------------------------------------------
             |                                                       |
             |           --------- ---------         ------- ------- |
-----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
| server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
| pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
| connect |  | -------                       -------                 |
-----------  |     | controller_num=1 (no eswitch)                   |
             ------|--------------------------------------------------
             (internal wire)
                   |
             ---------------------------------------------------------
             | devlink eswitch ports and reps                        |
             | ----------------------------------------------------- |
             | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
             | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
             | ----------------------------------------------------- |
             | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
             | |pf1    | pf1vfN | pf1sfN | pf1    | pf1vfN |pf0sfN | |
             | ----------------------------------------------------- |
             |                                                       |
             |                                                       |
             |           --------- ---------         ------- ------- |
             |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
             | -------   ----/---- ---/----- ------- ---/--- ---/--- |
             | | pf0 |______/________/       | pf1 |___/_______/     |
             | -------                       -------                 |
             |                                                       |
             |  local controller_num=0 (eswitch)                     |
             ---------------------------------------------------------

An example devlink port for external controller with controller
number = 1 for a VF 1 of PF 0:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a2d9588

devlink: Introduce external controller flag · 05b595e9

Parav Pandit authored Sep 09, 2020

A devlink eswitch port may represent PCI PF/VF ports of a controller.

A controller either located on same system or it can be an external
controller located in host where such NIC is plugged in.

Add the ability for driver to specify if a port is for external
controller.

Use such flag in the mlx5_core driver.

An example of an external controller having VF1 of PF0 belong to
controller 1.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05b595e9

devlink: Move structure comments outside of structure · ff03e63a

Parav Pandit authored Sep 09, 2020

To add more fields to the PCI PF and VF port attributes, follow standard
structure comment format.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff03e63a

devlink: Add comment block for missing port attributes · 2efbe6ae

Parav Pandit authored Sep 09, 2020

Add comment block for physical, PF and VF port attributes.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2efbe6ae

net/mlx5: E-switch, Read controller number from device · a53cf949

Parav Pandit authored Sep 09, 2020

ECPF supports one external host controller. Read controller number
from the device.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a53cf949

net: stmmac: dwmac-intel-plat: remove redundant null check before clk_disable_unprepare() · 6b5472d4

Zhang Changzhong authored Sep 09, 2020

Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b5472d4

net: pxa168_eth: remove redundant null check before clk_disable_unprepare() · a0d48518

Zhang Changzhong authored Sep 09, 2020

Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0d48518

Merge branch 'SMSC-Cleanups-and-clock-setup' · 34e43543

David S. Miller authored Sep 09, 2020

Marco Felsch says:

====================
SMSC: Cleanups and clock setup

this small series cleans the smsc-phy code a bit and adds the support to
specify the phy clock source. Adding the phy clock source support is
also the main purpose of this series.

Each file has its own changelog.

Thanks a lot to Florian and Andrew for reviewing it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

34e43543

net: phy: smsc: LAN8710/20: remove PHY_RST_AFTER_CLK_EN flag · d65af218

Marco Felsch authored Sep 09, 2020

Don't reset the phy without respect to the PHY library state machine
because this breaks the phy IRQ mode. The same behaviour can be archived
now by specifying the refclk.
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d65af218

net: phy: smsc: LAN8710/20: add phy refclk in support · bedd8d78

Marco Felsch authored Sep 09, 2020

Add support to specify the clock provider for the PHY refclk and don't
rely on 'magic' host clock setup. [1] tried to address this by
introducing a flag and fixing the corresponding host. But this commit
breaks the IRQ support since the irq setup during .config_intr() is
thrown away because the reset comes from the side without respecting the
current PHY state within the PHY library state machine. Furthermore the
commit fixed the problem only for FEC based hosts other hosts acting
like the FEC are not covered.

This commit goes the other way around to address the bug fixed by [1].
Instead of resetting the device from the side every time the refclk gets
(re-)enabled it requests and enables the clock till the device gets
removed. Now the PHY library is the only place where the PHY gets reset
to respect the PHY library state machine.

[1] commit 7f64e5b1 ("net: phy: smsc: LAN8710/20: add
    PHY_RST_AFTER_CLK_EN flag")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bedd8d78