Commits · 72d61d30097fea6c74cc06eee3d911788f5f4c6b · Kirill Smelkov / linux

18 Sep, 2020 4 commits

Merge branch 'mlxsw-Support-dcbnl_setbuffer-dcbnl_getbuffer' · 72d61d30

David S. Miller authored Sep 17, 2020

Ido Schimmel says:

====================
mlxsw: Support dcbnl_setbuffer, dcbnl_getbuffer

Petr says:

On Spectrum, port buffers, also called port headroom, is where packets are
stored while they are parsed and the forwarding decision is being made. For
lossless traffic flows, in case shared buffer admission is not allowed,
headroom is also where to put the extra traffic received before the sent
PAUSE takes effect.

Linux supports two DCB interfaces related to the headroom: dcbnl_setbuffer
for configuration, and dcbnl_getbuffer for inspection. This patch set
implements them.

With dcbnl_setbuffer in place, there will be two sources of authority over
the ingress configuration: the DCB ETS hook, because ETS configuration is
mirrored to ingress, and the DCB setbuffer hook. mlxsw is in a similar
situation on the egress side, where there are two sources of the ETS
configuration: the DCB ETS hook, and the TC qdisc hooks. This is a
non-intuitive situation, because the way the ASIC ends up being configured
depends not only on the actual configured bits, but also on the order in
which they were configured.

To prevent these issues on the ingress side, two configuration modes will
exist: DCB mode and TC mode. DCB ETS will keep getting projected to ingress
in the (default) DCB mode. When a qdisc is installed on a port, it will be
switched to the TC mode, the ingress configuration will be done through the
dcbnl_setbuffer callback. The reason is that the dcbnl_setbuffer hook is
not standardized and supported by lldpad. Projecting DCB ETS configuration
to ingress is a reasonable heuristic to configure ingress especially when
PFC is in effect.

In patch #1, the toggle between the DCB and TC modes of headroom
configuration, described above, is introduced.

Patch #2 implements dcbnl_getbuffer and dcbnl_setbuffer. dcbnl_getbuffer
can be always used to determine the current port headroom configuration.
dcbnl_setbuffer is only permitted in the TC mode.

In patch #3, make the qdisc module toggle the headroom mode from DCB to TC
and back, depending on whether there is an offloaded qdisc on the port.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

72d61d30

mlxsw: spectrum_qdisc: Disable port buffer autoresize with qdiscs · 509f04ca

Petr Machata authored Sep 17, 2020

There are two interfaces to configure ETS: qdiscs and DCB. Historically,
DCB ETS configuration was projected to ingress as well, and configured port
buffers. Qdisc was not.

Keep qdiscs behaving this way, and if an offloaded qdisc is configured on a
port, move this port's headroom to a manual mode, thus allowing
configuration of port buffers through dcbnl_setbuffer.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

509f04ca

mlxsw: spectrum_dcb: Implement dcbnl_setbuffer / getbuffer · 5ebc6031

Petr Machata authored Sep 17, 2020

Add dcbnl_setbuffer, which bounces requests if a headroom is in DCB mode.
Implement dcbnl_getbuffer such that it can always be used to determine
port-buffer configuration, regardless of headroom mode.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ebc6031

mlxsw: spectrum_buffers: Support two headroom modes · 69e408a2

Petr Machata authored Sep 17, 2020

There are two interfaces to configure ETS: qdiscs and DCB. Historically,
DCB ETS configuration was projected to ingress as well, and configured port
buffers. Qdisc was not.

So as not to break clients that today use DCB ETS and PFC and rely on
getting a reasonable ingress buffer priomap, keep the ETS mirroring in
effect.

Since qdiscs have not done this mirroring historically, it is reasonable
not to introduce it, but rather permit manual ingress configuration through
dcbnl_setbuffer only in the qdisc mode.

This will require a toggle to indicate whether buffer sizes should be
autocomputed or taken from dcbnl_setbuffer, and likewise for priomaps.
Introduce such and initialize it, and guard port buffer size configuration
as appropriate. The toggle is currently left in the DCB position. In a
following patch, qdisc code will switch it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69e408a2

17 Sep, 2020 25 commits

netlink: add spaces around '&' in netlink_recv/sendmsg() · 4d11af5d

Yang Yingliang authored Sep 17, 2020

It's hard to read the code without spaces around '&',
for better reading, add spaces around '&'.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d11af5d

netdev: Remove unused functions · 2492c205

YueHaibing authored Sep 17, 2020

There is no callers in tree, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2492c205

mptcp: Fix unsigned 'max_seq' compared with zero in mptcp_data_queue_ofo · c2ec6bc0

Ye Bin authored Sep 17, 2020

Fixes coccicheck warnig:
net/mptcp/protocol.c:164:11-18: WARNING: Unsigned expression compared with zero: max_seq > 0

Fixes: ab174ad8 ("mptcp: move ooo skbs into msk out of order queue")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2ec6bc0

Merge branch... · 3ce406bd

David S. Miller authored Sep 17, 2020

Merge branch 'net-marvell-prestera-Add-Switchdev-driver-for-Prestera-family-ASIC-device-98DX3255-AC3x'

Vadym Kochan says:

====================
net: marvell: prestera: Add Switchdev driver for Prestera family ASIC device 98DX3255 (AC3x)

Marvell Prestera 98DX3255 integrates up to 24 ports of 1GbE with 8
ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
wireless SMB deployment.

Prestera Switchdev is a firmware based driver that operates via PCI bus.  The
current implementation supports only boards designed for the Marvell Switchdev
solution and requires special firmware.

This driver implementation includes only L1, basic L2 support, and RX/TX.

The core Prestera switching logic is implemented in prestera_main.c, there is
an intermediate hw layer between core logic and firmware. It is
implemented in prestera_hw.c, the purpose of it is to encapsulate hw
related logic, in future there is a plan to support more devices with
different HW related configurations.

The following Switchdev features are supported:

    - VLAN-aware bridge offloading
    - VLAN-unaware bridge offloading
    - FDB offloading (learning, ageing)
    - Switchport configuration

The original firmware image is uploaded to the linux-firmware repository.

PATCH v9:
    1) Replace read_poll_timeout_atomic() by original 'do {} while()' loop
       because it works much better than read_poll_timeout_atomic()
       considering the TX rate. Also it fixes warning reported on v8.

    2) Use ENOENT instead of EEXIST when item is not found in few
       places - prestera_hw.c and prestera_rxtx.c

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices

PATCH v8:
    1) Put license in one line.

    2) Sort includes.

    3) Add missing comma for last enum member

    4) Return original error code from last called func
       in places where instead other error code was used.

    5) Add comma for last member in initialized struct in prestera_hw.c

    6) Do not initialize 'int err = 0' where it is not needed.

    7) Simplify device-tree "marvell,prestera" node parsing by removing not
       needed checking on 'np == NULL'.

    8) Use u32p_replace_bits() instead of open-coded ((word & ~mask) | val)

    9) Use dev_warn_ratelimited() instead of pr_warn_ratelimited to indicate the device
        instance in prestera_rxtx.c

    10) Simplify circular buffer list creation in prestera_sdma_{rx,tx}_init() by using
        do { } while (prev != tail) construction.

    11) Use MSEC_PER_SEC instead of hard-coded 1000.

    12) Use traditional error handling pattern:

       err = F();
       if (err)
           return err;

    13) Use ether_addr_copy() instead of memcpy() for mac FDB copying in prestera_hw.c

    14) Drop swdev->ageing_time member which is not used.

    15) Fix ageing macro to be in ms instead of seconds.

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
	[2] net: marvell: prestera: Add PCI interface support
        [3] net: marvell: prestera: Add basic devlink support
	[4] net: marvell: prestera: Add ethtool interface support
	[5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v7:
    1) Use ether_addr_copy() in prestera_main.c:prestera_port_set_mac_address()
       instead of memcpy().

    2) Removed not needed device's DMA address range check on
       dma_pool_alloc() in prestera_rxtx.c:prestera_sdma_buf_init(),
       this should be handled by dma_xxx() API considerig device's DMA mask.

    3) Removed not needed device's DMA address range check on
       dma_map_single() in prestera_rxtx.c:prestera_sdma_rx_skb_alloc(),
       this should be handled by dma_xxx() API considerig device's DMA mask.

    4) Add comment about port mac address limitation in the code where
       it is used and checked - prestera_main.c:

           - prestera_is_valid_mac_addr()
           - prestera_port_create()

    5) Add missing destroy_workqueue(swdev_wq) in prestera_switchdev.c:prestera_switchdev_init()
       on error path handling.

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
        [5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v6:
    1) Use rwlock to protect port list on create/delete stages. The list
       is mostly readable by fw event handler or packets receiver, but
       updated only on create/delete port which are performed on switch init/fini
       stages.

    2) Remove not needed variable initialization in prestera_dsa.c:prestera_dsa_parse()

    3) Get rid of bounce buffer used by tx handler in prestera_rxtx.c,
       the bounce buffer should be handled by dma_xxx API via swiotlb.

    4) Fix PRESTERA_SDMA_RX_DESC_PKT_LEN macro by using correct GENMASK(13, 0) in prestera_rxtx.c

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices

PATCH v5:
    0) add Co-developed tags for people who was involved in development.

    1) Make SPDX license as separate comment

    2) Change 'u8 *' -> 'void *', It allows to avoid not-needed u8* casting.

    3) Remove "," in terminated enum's.

    4) Use GENMASK(end, start) where it is applicable in.

    5) Remove not-needed 'u8 *' casting.

    6) Apply common error-check pattern

    7) Use ether_addr_copy instead of memcpy

    8) Use define for maximum MAC address range (255)

    9) Simplify prestera_port_state_set() in prestera_main.c by
       using separate if-blocks for state setting:

        if (is_up) {
        ...
        } else {
        ...
        }

      which makes logic more understandable.

    10) Simplify sdma tx wait logic when checking/updating tx_ring->burst.

    11) Remove not-needed packed & aligned attributes

    12) Use USEC_PER_MSEC as multiplier when converting ms -> usec on calling
        readl_poll_timeout.

    13) Simplified some error path handling by simple return error code in.

    14) Remove not-needed err assignment in.

    15) Use dev_err() in prestera_devlink_register(...).

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
	[2] net: marvell: prestera: Add PCI interface support
        [3] net: marvell: prestera: Add basic devlink support
	[4] net: marvell: prestera: Add ethtool interface support
	[5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v4:
    1) Use prestera_ prefix in netdev_ops variable.

    2) Kconfig: use 'default PRESTERA' build type for CONFIG_PRESTERA_PCI to be
       synced by default with prestera core module.

    3) Use memcpy_xxio helpers in prestera_pci.c for IO buffer copying.

    4) Generate fw image path via snprintf() instead of macroses.

    5) Use pcim_ helpers in prestera_pci.c which simplified the
       probe/remove logic.

    6) Removed not needed initializations of variables which are used in
       readl_poll_xxx() helpers.

    7) Fixed few grammar mistakes in patch[2] description.

    8) Export only prestera_ethtool_ops struct instead of each
       ethtool handler.

    9) Add check for prestera_dev_check() in switchdev event handling to
       make sure there is no wrong topology.

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
	[2] net: marvell: prestera: Add PCI interface support
	[4] net: marvell: prestera: Add ethtool interface support
	[5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v3:
    1) Simplify __be32 type casting in prestera_dsa.c

    2) Added per-patch changelog under "---" line.

PATCH v2:
    1) Use devlink_port_type_clear()

    2) Add _MS prefix to timeout defines.

    3) Remove not-needed packed attribute from the firmware ipc structs,
       also the firmware image needs to be uploaded too (will do it soon).

    4) Introduce prestera_hw_switch_fini(), to be mirrored with init and
       do simple validation if the event handlers are unregistered.

    5) Use kfree_rcu() for event handler unregistering.

    6) Get rid of rcu-list usage when dealing with ports, not needed for
       now.

    7) Little spelling corrections in the error/info messages.

    8) Make pci probe & remove logic mirrored.

    9) Get rid of ETH_FCS_LEN in headroom setting, not needed.

PATCH:
    1) Fixed W=1 warnings

    2) Renamed PCI driver name to be more generic "Prestera DX" because
       there will be more devices supported.

    3) Changed firmware image dir path: marvell/ -> mrvl/prestera/
       to be aligned with location in linux-firmware.git (if such
       will be accepted).

RFC v3:
    1) Fix prestera prefix in prestera_rxtx.c

    2) Protect concurrent access from multiple ports on multiple CPU system
       on tx path by spinlock in prestera_rxtx.c

    3) Try to get base mac address from device-tree, otherwise use a random generated one.

    4) Move ethtool interface support into separate prestera_ethtool.c file.

    5) Add basic devlink support and get rid of physical port naming ops.

    6) Add STP support in Switchdev driver.

    7) Removed MODULE_AUTHOR

    8) Renamed prestera.c -> prestera_main.c, and kernel module to
       prestera.ko

RFC v2:
    1) Use "pestera_" prefix in struct's and functions instead of mvsw_pr_

    2) Original series split into additional patches for Switchdev ethtool support.

    3) Use major and minor firmware version numbers in the firmware image filename.

    4) Removed not needed prints.

    5) Use iopoll API for waiting on register's value in prestera_pci.c

    6) Use standart approach for describing PCI ID matching section instead of using
       custom wrappers in prestera_pci.c

    7) Add RX/TX support in prestera_rxtx.c.

    8) Rewritten prestera_switchdev.c with following changes:
       - handle netdev events from prestera.c

       - use struct prestera_bridge for bridge objects, and get rid of
         struct prestera_bridge_device which may confuse.

       - use refcount_t

    9) Get rid of macro usage for sending fw requests in prestera_hw.c

    10) Add base_mac setting as module parameter. base_mac is required for
        generation default port's mac.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3ce406bd

dt-bindings: marvell,prestera: Add description for device-tree bindings · 40acc052

Vadym Kochan authored Sep 16, 2020

Add brief description how to configure base mac address binding in
device-tree.

Describe requirement for the PCI port which is connected to the ASIC, to
allow access to the firmware related registers.
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

40acc052

net: marvell: prestera: Add Switchdev driver implementation · e1189d9a

Vadym Kochan authored Sep 16, 2020

The following features are supported:

    - VLAN-aware bridge offloading
    - VLAN-unaware bridge offloading
    - FDB offloading (learning, ageing)
    - Switchport configuration

Currently there are some limitations like:

    - Only 1 VLAN-aware bridge instance supported
    - FDB ageing timeout parameter is set globally per device
Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1189d9a

net: marvell: prestera: Add ethtool interface support · a97d3c69

Vadym Kochan authored Sep 16, 2020

The ethtool API provides support for the configuration of the following
features: speed and duplex, auto-negotiation, MDI-x, forward error
correction, port media type. The API also provides information about the
port status, hardware and software statistic. The following limitation
exists:

    - port media type should be configured before speed setting
    - ethtool -m option is not supported
    - ethtool -p option is not supported
    - ethtool -r option is supported for RJ45 port only
    - the following combination of parameters is not supported:

          ethtool -s sw1pX port XX autoneg on

    - forward error correction feature is supported only on SFP ports, 10G
      speed

    - auto-negotiation and MDI-x features are not supported on
      Copper-to-Fiber SFP module
Co-developed-by: Andrii Savka <andrii.savka@plvision.eu>
Signed-off-by: Andrii Savka <andrii.savka@plvision.eu>
Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

a97d3c69

net: marvell: prestera: Add basic devlink support · 34dd1710

Vadym Kochan authored Sep 16, 2020

Add very basic support for devlink interface:

    - driver name
    - fw version
    - devlink ports
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

34dd1710

net: marvell: prestera: Add PCI interface support · 4c2703df

Vadym Kochan authored Sep 16, 2020

Add PCI interface driver for Prestera Switch ASICs family devices, which
provides:

    - Firmware loading mechanism
    - Requests & events handling to/from the firmware
    - Access to the firmware on the bus level

The firmware has to be loaded each time the device is reset. The driver
is loading it from:

    /lib/firmware/mrvl/prestera/mvsw_prestera_fw-v{MAJOR}.{MINOR}.img

The full firmware image version is located within the internal header
and consists of 3 numbers - MAJOR.MINOR.PATCH. Additionally, driver has
hard-coded minimum supported firmware version which it can work with:

    MAJOR - reflects the support on ABI level between driver and loaded
            firmware, this number should be the same for driver and loaded
            firmware.

    MINOR - this is the minimum supported version between driver and the
            firmware.

    PATCH - indicates only fixes, firmware ABI is not changed.

Firmware image file name contains only MAJOR and MINOR numbers to make
driver be compatible with any PATCH version.
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c2703df

net: marvell: prestera: Add driver for Prestera family ASIC devices · 501ef306

Vadym Kochan authored Sep 16, 2020

Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8
ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
wireless SMB deployment.

The current implementation supports only boards designed for the Marvell
Switchdev solution and requires special firmware.

The core Prestera switching logic is implemented in prestera_main.c,
there is an intermediate hw layer between core logic and firmware. It is
implemented in prestera_hw.c, the purpose of it is to encapsulate hw
related logic, in future there is a plan to support more devices with
different HW related configurations.

This patch contains only basic switch initialization and RX/TX support
over SDMA mechanism.

Currently supported devices have DMA access range <= 32bit and require
ZONE_DMA to be enabled, for such cases SDMA driver checks if the skb
allocated in proper range supported by the Prestera device.

Also meanwhile there is no TX interrupt support in current firmware
version so recycling work is scheduled on each xmit.

Port's mac address is generated from the switch base mac which may be
provided via device-tree (static one or as nvme cell), or randomly
generated. This is required by the firmware.
Co-developed-by: Andrii Savka <andrii.savka@plvision.eu>
Signed-off-by: Andrii Savka <andrii.savka@plvision.eu>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

501ef306

genetlink: Remove unused function genl_err_attr() · 5114b331

YueHaibing authored Sep 16, 2020

It is never used, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5114b331

net/sched: Remove unused function qdisc_queue_drop_head() · 2b7ea122

YueHaibing authored Sep 16, 2020

It is not used since commit a09ceb0e ("sched: remove qdisc->drop")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b7ea122

selftests: mptcp: interpret \n as a new line · 8b974778

Matthieu Baerts authored Sep 16, 2020

In case of errors, this message was printed:

  (...)
  # read: Resource temporarily unavailable
  #  client exit code 0, server 3
  # \nnetns ns1-0-BJlt5D socket stat for 10003:
  (...)

Obviously, the idea was to add a new line before the socket stat and not
print "\nnetns".

Fixes: b08fbf24 ("selftests: add test-cases for MPTCP MP_JOIN")
Fixes: 048d19d4 ("mptcp: add basic kselftest for mptcp")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b974778

net/packet: Fix a comment about mac_header · b79a80bd

Xie He authored Sep 16, 2020

1. Change all "dev->hard_header" to "dev->header_ops"

2. On receiving incoming frames when header_ops == NULL:

The comment only says what is wrong, but doesn't say what is right.
This patch changes the comment to make it clear what is right.

3. On transmitting and receiving outgoing frames when header_ops == NULL:

The comment explains that the LL header will be later added by the driver.

However, I think it's better to simply say that the LL header is invisible
to us. This phrasing is better from a software engineering perspective,
because this makes it clear that what happens in the driver should be
hidden from us and we should not care about what happens internally in the
driver.

4. On resuming the LL header (for RAW frames) when header_ops == NULL:

The comment says we are "unlikely" to restore the LL header.

However, we should say that we are "unable" to restore it.
It's not possible (rather than not likely) to restore it, because:

1) There is no way for us to restore because the LL header internally
processed by the driver should be invisible to us.

2) In function packet_rcv and tpacket_rcv, the code only tries to restore
the LL header when header_ops != NULL.

Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b79a80bd

Merge branch 'net-hns3-updates-for-next' · 31660a97

David S. Miller authored Sep 17, 2020

Huazhong Tan says:

====================
net: hns3: updates for -next

There are some optimizations related to IO path.

Change since V1:
- fixes a unsuitable handling in hns3_lb_clear_tx_ring() of #6 which
  pointed out by Saeed Mahameed.

previous version:
V1: https://patchwork.ozlabs.org/project/netdev/cover/1600085217-26245-1-git-send-email-tanhuazhong@huawei.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

31660a97

net: hns3: use napi_consume_skb() when cleaning tx desc · 619ae331

Yunsheng Lin authored Sep 16, 2020

Use napi_consume_skb() to batch consuming skb when cleaning
tx desc in NAPI polling.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

619ae331

net: hns3: use writel() to optimize the barrier operation · 48ee56fd

Yunsheng Lin authored Sep 16, 2020

writel() can be used to order I/O vs memory by default when
writing portable drivers. Use writel() to replace wmb() +
writel_relaxed(), and writel() is dma_wmb() + writel_relaxed()
for ARM64, so there is an optimization here because dma_wmb()
is a lighter barrier than wmb().
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48ee56fd

net: hns3: optimize the rx clean process · 8c30e194

Yunsheng Lin authored Sep 16, 2020

Currently HNS3_RING_RX_RING_FBDNUM_REG register is read to determine
how many rx desc can be cleaned. To avoid the register read operation
in the critical data path, use the valid bit in the rx desc to determine
if a specific rx desc can be cleaned.

The hns3 driver clear valid bit in the rx desc before notifying the
rx desc to the hw, and hw will only set the valid bit of the rx desc
after corresponding buffer is filled with packet data and other field
in the rx desc is set accordingly.

Add hns3_rx_ring_move_fw() function to clear the valid bit in the rx
desc before moving rx ring's next_to_clean forward to avoid double
cleaning a rx desc, also add a dma_rmb() barrier in hns3_handle_rx_bd()
to make sure valid bit is set before reading other field in the rx desc.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c30e194

net: hns3: optimize the tx clean process · 20d06ca2

Yunsheng Lin authored Sep 16, 2020

Currently HNS3_RING_TX_RING_HEAD_REG register is read to determine
how many tx desc can be cleaned. To avoid the register read operation
in the critical data path, use the valid bit in the tx desc to determine
if a specific tx desc can be cleaned.

The hns3 driver sets valid bit in the tx desc before ringing a doorbell
to the hw, and hw will only clear the valid bit of the tx desc after
corresponding packet is sent out to the wire. And because next_to_use
for tx ring is a changing variable when the driver is filling the tx
desc, so reuse the pull_len for rx ring to record the tx desc that has
notified to the hw, so that hns3_nic_reclaim_desc() can decide how many
tx desc's valid bit need checking when reclaiming tx desc.

And io_err_cnt stat is also removed for it is not used anymore.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20d06ca2

net: hns3: batch tx doorbell operation · f6061a05

Yunsheng Lin authored Sep 16, 2020

Use netdev_xmit_more() to defer the tx doorbell operation when
the skb is passed to the driver continuously. By doing this we
can improve the overall xmit performance by avoid some doorbell
operations.

Also, the tx_err_cnt stat is not used, so rename it to tx_more
stat.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6061a05

net: hns3: batch the page reference count updates · aeda9bf8

Yunsheng Lin authored Sep 16, 2020

Batch the page reference count updates instead of doing them
one at a time. By doing this we can improve the overall receive
performance by avoid some atomic increment operations when the
rx page is reused.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aeda9bf8

cxgb4vf: convert to use DEFINE_SEQ_ATTRIBUTE macro · b948577b

Liu Shixin authored Sep 16, 2020

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b948577b

ionic: dynamic interrupt moderation · 04a83459

Shannon Nelson authored Sep 15, 2020

Use the dim library to manage dynamic interrupt
moderation in ionic.

v3: rebase
v2: untangled declarations in ionic_dim_work()
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

04a83459

net/smc: check variable before dereferencing in smc_close.c · ddcc9b7f

Karsten Graul authored Sep 15, 2020

smc->clcsock and smc->clcsock->sk are used before the check if they can
be dereferenced. Fix this by checking the variables first.

Fixes: a60a2b1e ("net/smc: reduce active tcp_listen workers")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ddcc9b7f

net: bridge: mcast: don't ignore return value of __grp_src_toex_excl · d5bf31dd

Nikolay Aleksandrov authored Sep 15, 2020

When we're handling TO_EXCLUDE report in EXCLUDE filter mode we should
not ignore the return value of __grp_src_toex_excl() as we'll miss
sending notifications about group changes.

Fixes: 5bf1e00b ("net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5bf31dd

16 Sep, 2020 11 commits

net: stmmac: Add support to Ethtool get/set ring parameters · aa042f60

Song, Yoong Siang authored Sep 16, 2020

This patch add support to --show-ring & --set-ring Ethtool functions:
- Adding min, max, power of two check to new ring parameter's value.
- Bring down the network interface before changing the value of ring
  parameters.
- Bring up the network interface after changing the value of ring
  parameters.
Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa042f60

Merge branch 'mlxsw-Refactor-headroom-management' · 18e9a407

David S. Miller authored Sep 16, 2020

Ido Schimmel says:

====================
mlxsw: Refactor headroom management

Petr says:

On Spectrum, port buffers, also called port headroom, is where packets are
stored while they are parsed and the forwarding decision is being made. For
lossless traffic flows, in case shared buffer admission is not allowed,
headroom is also where to put the extra traffic received before the sent
PAUSE takes effect. Another aspect of the port headroom is the so called
internal buffer, which is used for egress mirroring.

Linux supports two DCB interfaces related to the headroom: dcbnl_setbuffer
for configuration, and dcbnl_getbuffer for inspection. In order to make it
possible to implement these interfaces, it is first necessary to clean up
headroom handling, which is currently strewn in several places in the
driver.

The end goal is an architecture whereby it is possible to take a copy of
the current configuration, adjust parameters, and then hand the proposed
configuration over to the system to implement it. When everything works,
the proposed configuration is accepted and saved. First, this centralizes
the reconfiguration handling to one function, which takes care of
coordinating buffer size changes and priority map changes to avoid
introducing drops. Second, the fact that the configuration is all in one
place makes it easy to keep a backup and handle error path rollbacks, which
were previously hard to understand.

Patch #1 introduces struct mlxsw_sp_hdroom, which will keep port headroom
configuration.

Patch #2 unifies handling of delay provision between PFC and PAUSE. From
now on, delay is to be measured in bytes of extra space, and will not
include MTU. PFC handler sets the delay directly from the parameter it gets
through the DCB interface. For PAUSE, MLXSW_SP_PAUSE_DELAY is converted to
have the same meaning.

In patches #3-#5, MTU, lossiness and priorities are gradually moved over to
struct mlxsw_sp_hdroom.

In patches #6-#11, handling of buffer resizing and priority maps is moved
from spectrum.c and spectrum_dcb.c to spectrum_buffers.c. The API is
gradually adapted so that struct mlxsw_sp_hdroom becomes the main interface
through which the various clients express how the headroom should be
configured.

Patch #12 is a small cleanup that the previous transformation made
possible.

In patch #13, the port init code becomes a boring client of the headroom
code, instead of rolling its own thing.

Patches #14 and #15 move handling of internal mirroring buffer to the new
headroom code as well. Previously, this code was in the SPAN module. This
patchset converts the SPAN module to another boring client of the headroom
code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

18e9a407

mlxsw: spectrum_buffers: Manage internal buffer in the hdroom code · 22881adf

Petr Machata authored Sep 16, 2020

Traffic mirroring modes that are in-chip implemented on egress need an
internal buffer to work. As the only client, the SPAN module was managing
the buffer so far. However logically it belongs to the buffers module. E.g.
buffer size validation needs to take the size of the internal buffer into
account.

Therefore move the related code from SPAN to spectrum_buffers. Move over
the callbacks that determine the minimum buffer size as a function of
maximum speed and MTU. Add a field describing the internal buffer to struct
mlxsw_sp_hdroom. Extend mlxsw_sp_hdroom_bufs_reset_sizes() to take care of
sizing the internal buffer as well. Change the SPAN module to invoke that
function and mlxsw_sp_hdroom_configure() like all the other hdroom clients.
Drop the now-unnecessary mlxsw_sp_span_port_buffer_disable().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22881adf

mlxsw: spectrum_buffers: Introduce shared buffer ops · a41b9626

Petr Machata authored Sep 16, 2020

The size of the internal buffer is currently calculated in the SPAN module.
Logically it belongs to the spectrum_buffers module, where it should be
moved. However, that being a chip-specific operation, it needs dynamic
dispatch. There currently is a chip-specific structure for description of
shared buffer values, struct mlxsw_sp_sb_vals. However placing ops into
this structure would be confusing. Therefore introduce a new per-chip
structure, currently empty, and initialize the ops pointer as appropriate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a41b9626

mlxsw: spectrum_buffers: Convert mlxsw_sp_port_headroom_init() · 0cda1a9b

Petr Machata authored Sep 16, 2020

Currently mlxsw_sp_port_headroom_init() configures both priomap and buffers
by hand. Additionally, for port buffers, it configures buffer 0 with a size
that it will never again have if PFC configuration is touched.

Rewrite the init code to become a client of the new hdroom code. The only
difference in invocation is that the configuration is forced, so that it is
issued even if the desired configuration happens to match what is contained
in (hitherto not initialized with meaningful values) mlxsw_sp_port->hdroom.

Since now mlxsw_sp_port_headroom_init() initializes all the PG buffers to
meaningful values, mlxsw_sp_hdroom_configure_buffers() can avoid querying
the current configuration, and can fill the whole PBMC itself.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cda1a9b

mlxsw: spectrum_buffers: Inline mlxsw_sp_sb_max_headroom_cells() · bd3e86a5

Petr Machata authored Sep 16, 2020

This function is now only used from the buffers module, and is a trivial
field reference. Just inline it and drop the related artifacts.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd3e86a5

mlxsw: spectrum_buffers: Move here the new headroom code · 4c22f29f

Petr Machata authored Sep 16, 2020

Move all the headroom code to the spectrum_buffers module, where it
belongs.

Rename mlxsw_sp_pg_buf_threshold_get() and mlxsw_sp_pg_buf_pack() to
..._hdroom_... to match the naming convention of the new headroom code.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c22f29f

mlxsw: spectrum: Move here the three-step headroom configuration from DCB · 7ace2c36

Petr Machata authored Sep 16, 2020

The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.

This sort of careful approach will also be useful for configuring port
buffer sizes and priority map by hand, through dcbnl_setbuffer. Therefore
move the code from the DCB handler to the generic headroom function.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ace2c36

mlxsw: spectrum_dcb: Convert mlxsw_sp_port_pg_prio_map() to hdroom code · e9c97e0e

Petr Machata authored Sep 16, 2020

The new hdroom code has certain conventions: iteration over priorities is
done through a variable named `prio', configuration is not pushed unless it
is dirty, but a `force' flag can be used to override this, updated
configuration is written to port. Convert the function
mlxsw_sp_port_pg_prio_map() to use these conventions and rename
appropriately to fit in.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9c97e0e

mlxsw: spectrum_dcb: Convert ETS handler fully to mlxsw_sp_hdroom_configure() · 8ec5e6b9

Petr Machata authored Sep 16, 2020

The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.

Both of the buffer size configuration operations are simply buffer size
configurations, there is no material difference between setting buffers to
zero and any other value. Therefore simply invoke the same
mlxsw_sp_hdroom_configure(), and drop mlxsw_sp_port_pg_destroy() and
mlxsw_sp_ets_has_pg() which are now unused.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ec5e6b9

mlxsw: spectrum: Split headroom autoresize out of buffer configuration · 2d9f703f

Petr Machata authored Sep 16, 2020

Split mlxsw_sp_port_headroom_set() to three functions.
mlxsw_sp_hdroom_bufs_reset_sizes() changes the sizes of the individual PG
buffers, and mlxsw_sp_hdroom_configure_buffers() will actually apply the
configuration. A third function, mlxsw_sp_hdroom_bufs_fit(), verifies that
the requested buffer configuration matches total headroom size
requirements.

Add wrappers, mlxsw_sp_hdroom_configure() and __..., that will eventually
perform full headroom configuration, but for now, only have them verify the
configured headroom size, and invoke mlxsw_sp_hdroom_configure_buffers().
Have them take the `force` argument to prepare for a later patch, even
though it is currently unused.

Note that the loop in mlxsw_sp_hdroom_configure_buffers() only goes through
DCBX_MAX_BUFFERS. Since there is no logic to configure the control buffer,
it needs to keep the values queried from the FW. Eventually this function
should configure all the PGs.

Note that conversion of __mlxsw_sp_dcbnl_ieee_setets() is not trivial. That
function performs the headroom configuration in three steps: first it
resizes the buffers and adds any new ones. Then it redirects priorities to
the new buffers. And finally it sets the size of the now-unused buffers to
zero. This way no packet drops are introduced.

So after invoking mlxsw_sp_hdroom_bufs_reset_sizes(), tweak the
configuration to keep the old sizes of PG buffers for those buffers whose
size was set to zero.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d9f703f