Commits · ab9cb729ab0d4151f7ba76dd3ff592242a694754 · nexedi / linux

06 Dec, 2018 3 commits

phy: marvell: Rename mii_lpa_to_linkmode_lpa_t · ab9cb729

Andrew Lunn authored Dec 05, 2018

Rename mii_lpa_to_linkmode_lpa_t to mii_lpa_mod_linkmode_lpa_t to
indicate it modifies the passed linkmode bitmap, without clearing any
other bits.

Also, ensure bit are clear which the lpa indicates should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab9cb729

net: mii: Rename mii_stat1000_to_linkmode_lpa_t · 78a24df3

Andrew Lunn authored Dec 05, 2018

Rename mii_stat1000_to_linkmode_lpa_t to
mii_stat1000_mod_linkmode_lpa_t to indicate it modifies the passed
linkmode bitmap, without clearing any other bits.

Add a helper to set/clear bits in a linkmode.

Use this helper to ensure bit are clear which the stat1000 indicates
should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

78a24df3

net: mii: Fix autoneg in mii_lpa_to_linkmode_lpa_t() · 5f15eed2

Andrew Lunn authored Dec 05, 2018

mii_adv_to_linkmode_adv_t() clears all bits before setting it needs to
set. This means the freshly set Autoneg gets cleared.

Change the order, and add comments about it clearing the old content
of the bitmap.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f15eed2

05 Dec, 2018 6 commits

net: documentation: build a directory structure for drivers · b255e500

Jakub Kicinski authored Dec 03, 2018

Documentation/networking/ is full of cryptically named files with
driver documentation.  This makes finding interesting information
at a glance really hard.  Move all those files into a directory
called device_drivers (since not all drivers are for device) and
fix up references.

RFC v0.1 -> RFC v1:
 - also add .txt suffix to the files which are missing it (Quentin)
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Henrik Austad <henrik@austad.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

b255e500

tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT · a74f0fa0

Eric Dumazet authored Dec 04, 2018

TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
as a step to enable bigger tcp sndbuf limits.

It works reasonably well, but the following happens :

Once the limit is reached, TCP stack generates
an [E]POLLOUT event for every incoming ACK packet.

This causes a high number of context switches.

This patch implements the strategy David Miller added
in sock_def_write_space() :

 - If TCP socket has a notsent_lowat constraint of X bytes,
   allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
   only if number of notsent bytes is below X/2

This considerably reduces TCP_NOTSENT_LOWAT overhead,
while allowing to keep the pipe full.

Tested:
 100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM

A:/# cat /proc/sys/net/ipv4/tcp_wmem
4096	262144	64000000
A:/# super_netperf 100 -H B -l 1000 -- -K bbr &

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/

A:/# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
 2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
 0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
 1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
 2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0

A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow

A:/# vmstat 5 5  # check that context switches have not inflated too much.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
 0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
 1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
 1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
 1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a74f0fa0

Merge branch 'act_tunnel_key-support-key-less-tunnels' · 4dc88ce6

David S. Miller authored Dec 04, 2018

Or Gerlitz says:

====================
net/sched: act_tunnel_key: support key-less tunnels

This short series from Adi Nissim allows to support key-less tunnels
by the tc tunnel key actions, which is needed for some GRE use-cases.

changes from V0:
 - addresses build warning spotted by kbuild, make sure to always init
   to zero the tunnel key
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4dc88ce6

net/sched: act_tunnel_key: Don't dump dst port if it wasn't set · 1c25324c

Adi Nissim authored Dec 02, 2018

It's possible to set a tunnel without a destination port. However,
on dump(), a zero dst port is returned to user space even if it was not
set, fix that.

Note that so far it wasn't required, b/c key less tunnels were not
supported and the UDP tunnels do require destination port.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c25324c

net/sched: act_tunnel_key: Allow key-less tunnels · 80ef0f22

Adi Nissim authored Dec 02, 2018

Allow setting a tunnel without a tunnel key. This is required for
tunneling protocols, such as GRE, that define the key as an optional
field.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80ef0f22

qed: fix spelling mistake "Dispalying" -> "Displaying" · d1ecf8a6

Colin Ian King authored Dec 03, 2018

There is a spelling mistake in a DP_NOTICE message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1ecf8a6

04 Dec, 2018 28 commits

Merge branch 'mlxsw-Add-one-armed-router-support' · 55827458

David S. Miller authored Dec 04, 2018

Ido Schimmel says:

====================
mlxsw: Add one-armed router support

Up until now, when a packet was routed by the ASIC through the same
router interface (RIF) from which it ingressed from, the ASIC passed the
sole copy of the packet to the kernel. This allowed the kernel to route
the packet and also potentially generate an ICMP redirect.

There are scenarios (e.g., "one-armed router") where packets are
intentionally routed this way and are therefore not deemed as
exceptions. In such scenarios the current method of trapping packets to
the CPU is problematic, as it results in major packet loss.

This patchset solves the problem by having the ASIC forward the packet,
but also send a copy to the CPU, which gives the kernel the opportunity
to generate required exceptions.

To prevent the kernel from forwarding such packets again, the driver
marks them with 'offload_l3_fwd_mark', which causes the kernel to
consume them in ip{,6}_forward_finish().

Patch #1 renames 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark'. When
set, the field indicates that a packet was already forwarded in L3
(unicast / multicast) by a capable device.

Patch #2 teaches the kernel to consume unicast packets that have
'offload_l3_fwd_mark' set.

Patch #3 changes mlxsw to mirror loopbacked (iRIF == eRIF) packets,
instead of trapping them.

Patch #4 adds a test case for above mentioned scenario.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

55827458

selftests: mlxsw: Add one-armed router test · b6f153d3

Ido Schimmel authored Dec 04, 2018

Construct a "one-armed router" topology and test that packets are
forwarded by the ASIC and that a copy of the packet is sent to the
kernel, which does not forward the packet again.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6f153d3

mlxsw: spectrum: Mirror loopbacked packets instead of trapping them · 2f4f4494

Ido Schimmel authored Dec 04, 2018

When the ASIC detects that a unicast packet is routed through the same
router interface (RIF) from which it ingressed (iRIF == eRIF), it raises
a trap called loopback error (LBERROR).

Thus far, this trap was configured to send a sole copy of the packet to
the CPU so that ICMP redirect packets could be potentially generated by
the kernel.

This is problematic as the CPU cannot forward packets at 3.2Tb/s and
there are scenarios (e.g., "one-armed router") where iRIF == eRIF is not
an exception.

Solve this by changing the trap to send a copy of the packet to the CPU.
To prevent the kernel from forwarding the packet again, it is marked
with 'offload_l3_fwd_mark'.

The trap is configured in a trap group of its own with a dedicated
policer in order not to prevent packets trapped by other traps from
reaching the CPU.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f4f4494

net: Do not route unicast IP packets twice · f839a6c9

Ido Schimmel authored Dec 04, 2018

Packets marked with 'offload_l3_fwd_mark' were already forwarded by a
capable device and should not be forwarded again by the kernel.
Therefore, have the kernel consume them.

The check is performed in ip{,6}_forward_finish() in order to allow the
kernel to process such packets in ip{,6}_forward() and generate required
exceptions. For example, ICMP redirects.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f839a6c9

skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' · 875e8939

Ido Schimmel authored Dec 04, 2018

Commit abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field") added
the 'offload_mr_fwd_mark' field to indicate that a packet has already
undergone L3 multicast routing by a capable device. The field is used to
prevent the kernel from forwarding a packet through a netdev through
which the device has already forwarded the packet.

Currently, no unicast packet is routed by both the device and the
kernel, but this is about to change by subsequent patches and we need to
be able to mark such packets, so that they will no be forwarded twice.

Instead of adding yet another field to 'struct sk_buff', we can just
rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet
either has a multicast or a unicast destination IP.

While at it, add a comment about both 'offload_fwd_mark' and
'offload_l3_fwd_mark'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

875e8939

net: marvell: convert to DEFINE_SHOW_ATTRIBUTE · d9bbd6a1

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9bbd6a1

net: qca_spi: convert to DEFINE_SHOW_ATTRIBUTE · 25079154

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25079154

net: stmmac: convert to DEFINE_SHOW_ATTRIBUTE · fb0d9c63

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb0d9c63

nfp: convert to DEFINE_SHOW_ATTRIBUTE · 6f6c74fa

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f6c74fa

Merge branch 'mlx4_core-cleanups' · 76eb6ea4

David S. Miller authored Dec 03, 2018

Tariq Toukan says:

====================
mlx4_core cleanups

This patchset by Erez contains cleanups to the mlx4_core driver.

Patch 1 replaces -EINVAL with -EOPNOTSUPP for unsupported operations.
Patch 2 fixes some coding style issues.

Series generated against net-next commit:
97e6c858 net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspend
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

76eb6ea4

net/mlx4_core: Fix several coding style errors · 92a59ad0

Erez Alfasi authored Dec 02, 2018

Fix 3 checkpatch errors within mlx4/main.c:
- Unnecessary mlx4_debug_level global variable initialization to 0.
- Prohibited space before comma.
- Whitespaces instead of tab.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92a59ad0

net/mlx4_core: Fix return codes of unsupported operations · 95aac2cd

Erez Alfasi authored Dec 02, 2018

Functions __set_port_type and mlx4_check_port_params returned
-EINVAL while the proper return code is -EOPNOTSUPP as a
result of an unsupported operation. All drivers should generate
this and all users should check for it when detecting an
unsupported functionality.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95aac2cd

net: phy: Also request modules for C45 IDs · 30fcd6a9

Jose Abreu authored Dec 02, 2018

Logic of phy_device_create() requests PHY modules according to PHY ID
but for C45 PHYs we use different field for the IDs.

Let's also request the modules for these IDs.

Changes from v1:
- Only request C22 modules if C45 are not present (Andrew)
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Joao Pinto <joao.pinto@synopsys.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30fcd6a9

Merge branch 'octeontx2-next' · 3eaf3ca6

David S. Miller authored Dec 03, 2018

Jerin Jacob says:

====================
octeontx2-af: NIX and NPC enhancements

This patchset is a continuation to earlier submitted four patch
series to add a new driver for Marvell's OcteonTX2 SOC's
Resource virtualization unit (RVU) admin function driver.

1. octeontx2-af: Add RVU Admin Function driver
   https://www.spinics.net/lists/netdev/msg528272.html
2. octeontx2-af: NPA and NIX blocks initialization
   https://www.spinics.net/lists/netdev/msg529163.html
3. octeontx2-af: NPC parser and NIX blocks initialization
   https://www.spinics.net/lists/netdev/msg530252.html
4. octeontx2-af: NPC MCAM support and FLR handling
   https://www.spinics.net/lists/netdev/msg534392.html

This patch series adds support for below

NPC block:
- Add NPC(mkex) profile support for various Key extraction configurations

NIX block:
- Enable dynamic RSS flow key algorithm configuration
- Enhancements on Rx checksum and error checks
- Add support for Tx packet marking support
- TL1 schedule queue allocation enhancements
- Add LSO format configuration mbox
- VLAN TPID configuration
- Skip multicast entry init for broadcast tables

v2:

- Rename FLOW_* to NIX_FLOW_* to avoid serious global namespace collisions,
as we have various FLOW_* definitions coming from
include/uapi/linux/pkt_cls.h for example.(David Miller)
- Pack the arguments of rvu_get_tl1_schqs() function
as 80 columns allows.(David Miller)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3eaf3ca6

octeontx2-af: Enable mkex profile · 23705adb

Vamsi Attunuru authored Dec 02, 2018

The following set of NPC registers allow the driver to configure NPC
to generate different key value schemes to compare against packet
payload in MCAM search.

NPC_AF_INTF(0..1)_KEX_CFG
NPC_AF_KEX_LDATA(0..1)_FLAGS_CFG
NPC_AF_INTF(0..1)_LID(0..7)_LT(0..15)_LD(0..1)_CFG
NPC_AF_INTF(0..1)_LDATA(0..1)_FLAGS(0..15)_CFG

Currently, the AF driver populates these registers to
configure the default values to address the most common
use cases such as key generation for channel number + DMAC.

The secure firmware stores different configuration
value of these registers to enable different NPC use case
along with the name for the lookup.

Patch loads profile binary from secure firmware over
the exiting CGX mailbox interface and apply the profile.

AF driver shall fall back to the default configuration
in case of any errors.

The AF consumer driver can know the selected profile
on response to NPC_GET_KEX_CFG mailbox by introducing
mkex_pfl_name in the struct npc_get_kex_cfg_rsp.
Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23705adb

octeontx2-af: Add LSO format configuration mailbox · da5d32e1

Nithin Dabilpuram authored Dec 02, 2018

NIX_AF_LSO_FORMAT(0..31)_FIELD(0..7) register enables an SW defined
means to define LSO packet modification formats.

0..31 works as an index to choose the algorithm, On success, the mailbox
returns the index to the client of chosen LSO algorithm selection.
This index will be used in configuring the transmit descriptors.

Add mailbox interface to dynamically reserve and configure LSO format.

This commit also fixes 'sizem1' for NIX_LSOALG_TCP_FLAGS
to '1' i.e 2 Bytes.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

da5d32e1

octeontx2-af: Add L3 and L4 packet verification mailbox · 159a8a67

Vidhya Raman authored Dec 02, 2018

Adds mailbox support for L4 checksum verification
and L3 and L4 length verification configuration.
Signed-off-by: Vidhya Raman <vraman@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

159a8a67

octeontx2-af: Configure VLAN TPIDs · a253933e

Nithin Dabilpuram authored Dec 02, 2018

Setup TPID's for vlan0 and vlan1 for Tx VLAN insertion offloads.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a253933e

octeontx2-af: Add support for Tx packet marking · a27d7659

Krzysztof Kanas authored Dec 02, 2018

NIX_AF_MARK_FORMAT(0..127)_CTL register enables an SW defined
means to mark/insert various data in the packet based on
final packet color from traffic shaping HW.

0..127 works as an index to choose the algorithm. On success,
the mailbox returns the index to the client.

Add NIX_MARK_FORMAT_CFG mailbox which reserves mark format based on
tuple (offset, y_mask, y_val, r_mask, r_val)

If the tuple is requested again for mark format that was already
reserved, then it will be reused. If not it will reserve a new entry
if space is available.

Also on AF init commonly used marker format such as VLAN DEI, IPv4
ECN, IPv4 DSCP are reserved for AF consumers.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a27d7659

octeontx2-af: Enable RSS with promiscuous mode · f9f2da46

Vamsi Attunuru authored Dec 02, 2018

This patch adds support for enabling RSS in promiscuous mode
if RSS is already requested by the AF client.
Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9f2da46

octeontx2-af: Define all NIX_AF_RX_DEF_* registers · 7c91a92e

Jerin Jacob authored Dec 02, 2018

In order to support all NIX specific valid length errors and
checksum errors on Rx, Update all NIX_AF_RX_DEF_* registers.

Also sorted all registers in HRM definition order.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c91a92e

octeontx2-af: Enable inner IPv4 checksum and its error code · 962e1bd6

Jerin Jacob authored Dec 02, 2018

This patch enables the inner IPv4 checksum and
defines the error code for Rx inner and outer checksum errors.
Setting ERRCODE as 1 so that CQE descriptor can be embedded
valid checksum error code and the driver can interpret
checksum error as ERRLEV = LID + 1 and ERRCODE = 1.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

962e1bd6

octeontx2-af: Allow freeing single TLx Tx schedule queue · e2703c5f

Nithin Dabilpuram authored Dec 02, 2018

The default behavior was to free all the TLx Tx schedule
queues. This patch adds support for freeing a single Tx
schedule queue if TXSCHQ_FREE_ALL flag is not set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2703c5f

octeontx2-af: Restrict TL1 allocation and configuration · 26dda7da

Nithin Dabilpuram authored Dec 02, 2018

TL1 is the root node in the scheduling hierarchy and
it is a global resource with a limited number.

This patch introduces restriction and validation on
the allocation of the TL1 nodes for the effective resource
sharing across the AF consumers.

- Limit TL1 allocation to 2 per lmac.
  One could be for the normal link and one for IEEE802.3br
  express link (Express Send DMA).
  Effectively all the VF's of an RVU PF(lmac) share the two TL1 schqs.
- TL1 cannot be freed once allocated.
- Allow VF's to only apply default config to TL1 if not
  already applied. PF's can always overwrite the TL1 config.
- Consider NIX_AQ_INSTOP_WRITE while validating txschq
  when sq.ena is set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26dda7da

octeontx2-af: Add support for runtime RSS algo index reservation · 7ee74697

Jerin Jacob authored Dec 02, 2018

Introduced reserve_flowkey_alg_idx()to reserve RSS algorithm index,
it would internally use set_flowkey_fields() to generate fields
based on the flow key dynamically.

On AF driver init, it would reserve a predefined set RSS algo indexes,
which will be available all the time for all the AF driver consumers.
The leftover algo indexes can be reserved at runtime through
exiting nix_rss_flowkey_cfg mailbox message.

The NIX_FLOW_KEY_TYPE_PORT is removed from predefined a set of RSS flow
type as it is not used by any consumer.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ee74697

octeontx2-af: Add support for dynamic flow cfg to RSS field generation · b648366c

Jerin Jacob authored Dec 02, 2018

Introduce state-based algorithm to convert the flow_key value
to RSS algo field used by NIX_AF_RX_FLOW_KEY_ALGX_FIELDX register.

The outer `for loop` goes over _all_ protocol field and the following
variables depict the state machine forward progress logic.

a) keyoff_marker - Enabled when hash byte length needs to be accounted
in field->key_offset update.
b) field_marker - Enabled when a new field needs to be selected.
c) group_member - Enabled when a protocol is part of a group.

This would remove the existing hard coding and enable to add
new protocol support seamlessly.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b648366c

octeontx2-af: Add response for RSS flow key cfg message · bd522d68

Jerin Jacob authored Dec 02, 2018

Added response for nix_rss_flowkey_cfg message to return
selected RSS algorithm index.

The FLOW_KEY_TYPE* definition is part of the mbox message and
it will be used by the other consumers of AF driver hence moving to mbox.h.

Also renamed FLOW_* definitions to NIX_FLOW_* to avoid global
name space collisions, as we have various coming from
include/uapi/linux/pkt_cls.h for example.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd522d68

octeontx2-af: Skip NIXLF check for bcast MCE entry · c5e4e4d1

Sunil Goutham authored Dec 02, 2018

At the time of initial broadcast packet replication table init,
NIXLFs are not yet attached to PF_FUNCs. Hence skipped checking
NIXLF while submitting MCE entry init instruction to NIX admin queue.

Also did a minor cleanup while installing bcast match entry in
packet parser unit i.e NPC.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5e4e4d1

03 Dec, 2018 3 commits

Merge branch 'udp-msg_zerocopy' · 6e360f73

David S. Miller authored Dec 03, 2018

Willem de Bruijn says:

====================
udp msg_zerocopy

Enable MSG_ZEROCOPY for udp sockets

Patch 1/3 is the main patch, a rework of RFC patch
  http://patchwork.ozlabs.org/patch/899630/
  more details in the patch commit message

Patch 2/3 is an optimization to remove a branch from the UDP hot path
  and refcount_inc/refcount_dec_and_test pair when zerocopy is used.
  This used to be included in the first patch in v2.

Patch 3/3 runs the already existing udp zerocopy tests
  as part of kselftest

See also recent Linux Plumbers presentation
  https://linuxplumbersconf.org/event/2/contributions/106/attachments/104/128/willemdebruijn-lpc2018-udpgso-presentation-20181113.pdf

Changes:
  v1 -> v2
    - Fixup reverse christmas tree violation
  v2 -> v3
    - Split refcount avoidance optimization into separate patch
      - Fix refcount leak on error in fragmented case
        (thanks to Paolo Abeni for pointing this one out!)
      - Fix refcount inc on zero
  v3 -> v4
    - Move skb_zcopy_set below the only kfree_skb that might cause
      a premature uarg destroy before skb_zerocopy_put_abort
      - Move the entire skb_shinfo assignment block, to keep that
	cacheline access in one place
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6e360f73

selftests: extend zerocopy tests to udp · db63e489

Willem de Bruijn authored Nov 30, 2018

Both msg_zerocopy and udpgso_bench have udp zerocopy variants.
Exercise these as part of the standard kselftest run.

With udp, msg_zerocopy has no control channel. Ensure that the
receiver exits after the sender by accounting for the initial
delay in starting them (in msg_zerocopy.sh).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db63e489

udp: elide zerocopy operation in hot path · 52900d22

Willem de Bruijn authored Nov 30, 2018

With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
Release of its last reference triggers a completion notification.

The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
the skbs, because it can build, send and free skbs within its loop,
possibly reaching refcount zero and freeing the ubuf_info too soon.

The UDP stack currently also takes this extra ref, but does not need
it as all skbs are sent after return from __ip(6)_append_data.

Avoid the extra refcount_inc and refcount_dec_and_test, and generally
the sock_zerocopy_put in the common path, by passing the initial
reference to the first skb.

This approach is taken instead of initializing the refcount to 0, as
that would generate error "refcount_t: increment on 0" on the
next skb_zcopy_set.

Changes
  v3 -> v4
    - Move skb_zcopy_set below the only kfree_skb that might cause
      a premature uarg destroy before skb_zerocopy_put_abort
      - Move the entire skb_shinfo assignment block, to keep that
        cacheline access in one place
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52900d22