Commits · 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c · Kirill Smelkov / linux

17 Jan, 2019 33 commits

tcp: always set retrans_stamp on recovery · 7ae18975

Yuchung Cheng authored Jan 16, 2019

Previously TCP socket's retrans_stamp is not set if the
retransmission has failed to send. As a result if a socket is
experiencing local issues to retransmit packets, determining when
to abort a socket is complicated w/o knowning the starting time of
the recovery since retrans_stamp may remain zero.

This complication causes sub-optimal behavior that TCP may use the
latest, instead of the first, retransmission time to compute the
elapsed time of a stalling connection due to local issues. Then TCP
may disrecard TCP retries settings and keep retrying until it finally
succeed: not a good idea when the local host is already strained.

The simple fix is to always timestamp the start of a recovery.
It's worth noting that retrans_stamp is also used to compare echo
timestamp values to detect spurious recovery. This patch does
not break that because retrans_stamp is still later than when the
original packet was sent.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ae18975

tcp: always timestamp on every skb transmission · 7f12422c

Yuchung Cheng authored Jan 16, 2019

Previously TCP skbs are not always timestamped if the transmission
failed due to memory or other local issues. This makes deciding
when to abort a socket tricky and complicated because the first
unacknowledged skb's timestamp may be 0 on TCP timeout.

The straight-forward fix is to always timestamp skb on every
transmission attempt. Also every skb retransmission needs to be
flagged properly to avoid RTT under-estimation. This can happen
upon receiving an ACK for the original packet and the a previous
(spurious) retransmission has failed.

It's worth noting that this reverts to the old time-stamping
style before commit 8c72c65b ("tcp: update skb->skb_mstamp more
carefully") which addresses a problem in computing the elapsed time
of a stalled window-probing socket. The problem will be addressed
differently in the next patches with a simpler approach.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f12422c

tcp: exit if nothing to retransmit on RTO timeout · 88f8598d

Yuchung Cheng authored Jan 16, 2019

Previously TCP only warns if its RTO timer fires and the
retransmission queue is empty, but it'll cause null pointer
reference later on. It's better to avoid such catastrophic failure
and simply exit with a warning.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

88f8598d

net: phy: micrel: use phy_read_mmd and phy_write_mmd · 9b420eff

Heiner Kallweit authored Jan 16, 2019

This driver implements open-coded versions of phy_read_mmd() and
phy_write_mmd() for KSZ9031. That's not needed, let's use the
phylib functions directly.

This is compile-tested only because I have no such hardware.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b420eff

davicom: Annotate implicit fall through in dm9000_set_io · 43deda54

Mathieu Malaterre authored Jan 16, 2019

There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).

This commit removes the following warning:

include/linux/device.h:1480:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/ethernet/davicom/dm9000.c:397:3: note: in expansion of macro 'dev_dbg'
drivers/net/ethernet/davicom/dm9000.c:398:2: note: here
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

43deda54

net/ipv6/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE · 49b4994c

David Herrmann authored Jan 15, 2019

The udp-tunnel setup allows binding sockets to a network device. Prefer
the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name
just to look it up in the ioctl again.
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

49b4994c

net/ipv4/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE · 2eadee72

David Herrmann authored Jan 15, 2019

The udp-tunnel setup allows binding sockets to a network device. Prefer
the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name
just to look it up in the ioctl again.
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2eadee72

net: introduce SO_BINDTOIFINDEX sockopt · f5dd3d0c

David Herrmann authored Jan 15, 2019

This introduces a new generic SOL_SOCKET-level socket option called
SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a
network interface index as argument, rather than the network interface
name.

User-space often refers to network-interfaces via their index, but has
to temporarily resolve it to a name for a call into SO_BINDTODEVICE.
This might pose problems when the network-device is renamed
asynchronously by other parts of the system. When this happens, the
SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong
device.

In most cases user-space only ever operates on devices which they
either manage themselves, or otherwise have a guarantee that the device
name will not change (e.g., devices that are UP cannot be renamed).
However, particularly in libraries this guarantee is non-obvious and it
would be nice if that race-condition would simply not exist. It would
make it easier for those libraries to operate even in situations where
the device-name might change under the hood.

A real use-case that we recently hit is trying to start the network
stack early in the initrd but make it survive into the real system.
Existing distributions rename network-interfaces during the transition
from initrd into the real system. This, obviously, cannot affect
devices that are up and running (unless you also consider moving them
between network-namespaces). However, the network manager now has to
make sure its management engine for dormant devices will not run in
parallel to these renames. Particularly, when you offload operations
like DHCP into separate processes, these might setup their sockets
early, and thus have to resolve the device-name possibly running into
this race-condition.

By avoiding a call to resolve the device-name, we no longer depend on
the name and can run network setup of dormant devices in parallel to
the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this
race.
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5dd3d0c

tls: Fix recvmsg() to be able to peek across multiple records · 692d7b5d

Vakul Garg authored Jan 16, 2019

This fixes recvmsg() to be able to peek across multiple tls records.
Without this patch, the tls's selftests test case
'recv_peek_large_buf_mult_recs' fails. Each tls receive context now
maintains a 'rx_list' to retain incoming skb carrying tls records. If a
tls record needs to be retained e.g. for peek case or for the case when
the buffer passed to recvmsg() has a length smaller than decrypted
record length, then it is added to 'rx_list'. Additionally, records are
added in 'rx_list' if the crypto operation runs in async mode. The
records are dequeued from 'rx_list' after the decrypted data is consumed
by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK
flag is used in recvmsg(), then records are not consumed or removed
from the 'rx_list'.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

692d7b5d

Merge branch 'dsa-lantiq_gswip-probe-fixes-and-remove-cleanup' · fb73d620

David S. Miller authored Jan 17, 2019

Johan Hovold says:

====================
net: dsa: lantiq_gswip: probe fixes and remove cleanup

This series fix a few issues found through inspection when fixing up new
bad uses of of_find_compatible_node() that have crept in since 4.19.

Note that these have only been compile tested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fb73d620

net: dsa: lantiq_gswip: drop bogus drvdata check · 8bb18f69

Johan Hovold authored Jan 16, 2019

The platform-device driver data is set on successful probe and will
never be NULL on remove (or we have much bigger problems).
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

8bb18f69

net: dsa: lantiq_gswip: fix OF child-node lookups · c8cbcb0d

Johan Hovold authored Jan 16, 2019

Use the new of_get_compatible_child() helper to look up child nodes to
avoid ever matching non-child nodes elsewhere in the tree.

Also fix up the related struct device_node leaks.

Fixes: 14fceff4 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable <stable@vger.kernel.org>	# 4.20
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8cbcb0d

net: dsa: lantiq_gswip: fix use-after-free on failed probe · aed13f2e

Johan Hovold authored Jan 16, 2019

Make sure to disable and deregister the switch on late probe errors to
avoid use-after-free when the device-resource-managed switch is freed.

Fixes: 14fceff4 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable <stable@vger.kernel.org>	# 4.20
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

aed13f2e

sfc: extend MTD support for newer hardware · 5fb1beec

Bert Kenward authored Jan 16, 2019

The X2 family of NICs (based on the SFC9250) have additional
MTD partitions for firmware and configuration. This includes
partitions that are read-only.

The NICs also have extended versions of the NVRAM interface,
allowing more detailed status information to be returned.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5fb1beec

selftests/tls: Fix recv partial/large_buff test cases · cea3bfb3

Vakul Garg authored Jan 16, 2019

TLS test cases recv_partial & recv_peek_large_buf_mult_recs expect to
receive a certain amount of data and then compare it against known
strings using memcmp. To prevent recvmsg() from returning lesser than
expected number of bytes (compared in memcmp), MSG_WAITALL needs to be
passed in recvmsg().
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cea3bfb3

net: phy: check return code when requesting PHY driver module · 13d0ab67

Heiner Kallweit authored Jan 16, 2019

When requesting the PHY driver module fails we'll bind the genphy
driver later. This isn't obvious to the user and may cause, depending
on the PHY, different types of issues. Therefore check the return code
of request_module(). Note that we only check for failures in loading
the module, not whether a module exists for the respective PHY ID.

v2:
- add comment explaining what is checked and what is not
- return error from phy_device_create() if loading module fails
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

13d0ab67

net/tls: Make function tls_sw_do_sendpage static · 01cb8a1a

YueHaibing authored Jan 16, 2019

Fixes the following sparse warning:

 net/tls/tls_sw.c:1023:5: warning:
 symbol 'tls_sw_do_sendpage' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01cb8a1a

net/tls: remove unused function tls_sw_sendpage_locked · f3de19af

YueHaibing authored Jan 16, 2019

There are no in-tree callers.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3de19af

Optimize sk_msg_clone() by data merge to end dst sg entry · fda497e5

Vakul Garg authored Jan 16, 2019

Function sk_msg_clone has been modified to merge the data from source sg
entry to destination sg entry if the cloned data resides in same page
and is contiguous to the end entry of destination sk_msg. This improves
kernel tls throughput to the tune of 10%.

When the user space tls application calls sendmsg() with MSG_MORE, it leads
to calling sk_msg_clone() with new data being cloned placed continuous to
previously cloned data. Without this optimization, a new SG entry in
the destination sk_msg i.e. rec->msg_plaintext in tls_clone_plaintext_msg()
gets used. This leads to exhaustion of sg entries in rec->msg_plaintext
even before a full 16K of allowable record data is accumulated. Hence we
lose oppurtunity to encrypt and send a full 16K record.

With this patch, the kernel tls can accumulate full 16K of record data
irrespective of the size of data passed in sendmsg() with MSG_MORE.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fda497e5

net: hns: Use struct_size() in devm_kzalloc() · 4559dd24

Gustavo A. R. Silva authored Jan 15, 2019

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = devm_kzalloc(dev, sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = devm_kzalloc(dev, struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4559dd24

net: phy: Add helpers to determine if PHY driver is generic · 5db5ea99

Florian Fainelli authored Jan 15, 2019

We are already checking in phy_detach() that the PHY driver is of
generic kind (1G or 10G) and we are going to make use of that in the SFP
layer as well for 1000BaseT SFP modules, so expose helper functions to
return that information.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

5db5ea99

Merge branch 'dsa-Split-platform-data-to-header-file' · 6f24e159

David S. Miller authored Jan 17, 2019

Florian Fainelli says:

====================
net: dsa: Split platform data to header file

This patch series decouples the DSA platform data structures from
net/dsa.h which was getting used for all sorts of DSA related
structures.

It would probably make sense for this series to go via David's net-next
tree to avoid conflicts on the ARM part, since we cannot obviously
include a header that does not yet exist.

No functional changes intended.
====================
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f24e159

net: dsa: Include platform_data header file · 8cfb5faf

Florian Fainelli authored Jan 15, 2019

b53 and mv88e6xxx support passing platform_data, and now that we have
split the platform_data portion from the main net/dsa.h header file,
include only the relevant parts.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8cfb5faf

ARM: orion5x: Include platform_data/dsa.h · e5f02a31

Florian Fainelli authored Jan 15, 2019

Now that we have split the DSA platform data structures from the main
net/dsa.h header file, include only the relevant header file.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5f02a31

net: dsa: Split platform data to header file · ecfc9372

Florian Fainelli authored Jan 15, 2019

Instead of having net/dsa.h contain both the internal switch tree/driver
structures, split the relevant platform_data parts into
include/linux/platform_data/dsa.h and make that header be included by
net/dsa.h in order not to break any setup. A subsequent set of patches
will update code including net/dsa.h to include only the platform_data
header.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecfc9372

net-next/hinic: replace disable_irq_nosync/enable_irq · 905b464a

Xue Chaojing authored Jan 15, 2019

In order to avoid frequent system interrupts when sending and
receiving packets. we replace disable_irq_nosync/enable_irq
with hinic_set_msix_state(), hinic_set_msix_state is used to
access memory mapped hinic devices.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

905b464a

net: dsa: Add ndo_get_phys_port_name() for CPU port · da7b9e9b

Florian Fainelli authored Jan 15, 2019

There is not currently way to infer the port number through sysfs that
is being used as the CPU port number. Overlay a ndo_get_phys_port_name()
operation onto the DSA master network device in order to retrieve that
information.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

da7b9e9b

Documentation: networking: dsa: Update documentation · 44543f1d

Florian Fainelli authored Jan 15, 2019

Since 83c0afae ("net: dsa: Add new binding implementation"), DSA is
no longer a platform device exclusively and can support registering DSA
switches from other bus drivers (PCI, USB, I2C, etc.).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

44543f1d

cxgb4/l2t: Use struct_size() in kvzalloc() · 78c787c2

Gustavo A. R. Silva authored Jan 15, 2019

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78c787c2

openvswitch: meter: Use struct_size() in kzalloc() · c5c3899d

Gustavo A. R. Silva authored Jan 15, 2019

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5c3899d

net: phy: don't include asm/irq.h directly · 3fcb3f9b

Heiner Kallweit authored Jan 15, 2019

There's no need to and one shouldn't include asm/irq.h directly.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fcb3f9b

net: phy: improve logging in phylib · c3a6a174

Heiner Kallweit authored Jan 15, 2019

Some time ago phydev_info() and friends have been added. They allow to
improve and simplify logging.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3a6a174

net: phy: remove preliminary workaround for not loading PHY driver · 1868e3d7

Heiner Kallweit authored Jan 15, 2019

This workaround attempt helped for some but not all affected users.
With commit 11287b69 ("r8169: load Realtek PHY driver module
before r8169") we have a better workaround now, so we an remove
the first attempt.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1868e3d7

16 Jan, 2019 7 commits

Merge branch 'nfp-flower-improve-flower-resilience' · 159882f4

David S. Miller authored Jan 16, 2019

Jakub Kicinski says:

====================
nfp: flower: improve flower resilience

This series contains mostly changes which improve nfp flower
offload's resilience, but are too large or risky to push into net.

Fred makes the driver waits for flower FW responses uninterruptible,
and a little longer (~40ms).

Pieter adds support for cards with multiple rule memories.

John reworks the MAC offloads.  He says:
> When potential tunnel end-point MACs are offloaded, they are assigned an
> index. This index may be associated with a port number meaning that if a
> packet matches an offloaded MAC address on the card, then the ingress
> port for that MAC can also be verified. In the case of shared MACs (e.g.
> on a linux bond) there may be situations where this index maps to only
> one of the ports that share the MAC.
>
> The idea of 'global' MAC indexes are supported that bypass the check on
> ingress port on the NFP. The patchset tracks shared MACs and assigns
> global indexes to these. It also ensures that port based indexes are
> re-applied if a single port becomes the only user of an offloaded MAC.
>
> Other patches in the set aim to tidy code without changing functionality.
> There is also a delete offload message introduced to ensure that MACs no
> longer in use in kernel space are removed from the firmware lookup tables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

159882f4

nfp: flower: enable MAC address sharing for offloadable devs · 20cce886

John Hurley authored Jan 15, 2019

A MAC address is not necessarily a unique identifier for a netdev. Drivers
such as Linux bonds, for example, can apply the same MAC address to the
upper layer device and all lower layer devices.

NFP MAC offload for tunnel decap includes port verification for reprs but
also supports the offload of non-repr MAC addresses by assigning 'global'
indexes to these. This means that the FW will not verify the incoming port
of a packet matching this destination MAC.

Modify the MAC offload logic to assign global indexes based on MAC address
instead of net device (as it currently does). Use this to allow multiple
devices to share the same MAC. In other words, if a repr shares its MAC
address with another device then give the offloaded MAC a global index
rather than associate it with an ingress port. Track this so that changes
can be reverted as MACs stop being shared.

Implement this by removing the current list based assignment of global
indexes and replacing it with an rhashtable that maps an offloaded MAC
address to the number of devices sharing it, distributing global indexes
based on this.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20cce886

nfp: flower: ensure MAC cleanup on address change · 13cf7103

John Hurley authored Jan 15, 2019

It is possible to receive a MAC address change notification without the
net device being down (e.g. when an OvS bridge is assigned the same MAC as
a port added to it). This means that an offloaded MAC address may not be
removed if its device gets a new address.

Maintain a record of the offloaded MAC addresses for each repr and netdev
assigned a MAC offload index. Use this to delete the (now expired) MAC if
a change of address event occurs. Only handle change address events if the
device is already up - if not then the netdev up event will handle it.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

13cf7103

nfp: flower: add infastructure for non-repr priv data · 05d2bee6

John Hurley authored Jan 15, 2019

NFP repr netdevs contain private data that can store per port information.
In certain cases, the NFP driver offloads information from non-repr ports
(e.g. tunnel ports). As the driver does not have control over non-repr
netdevs, it cannot add/track private data directly to the netdev struct.

Add infastructure to store private information on any non-repr netdev that
is offloaded at a given time. This is used in a following patch to track
offloaded MAC addresses for non-reprs and enable correct house keeping on
address changes.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05d2bee6

nfp: flower: ensure deletion of old offloaded MACs · 49402b0b

John Hurley authored Jan 15, 2019

When a potential tunnel end point goes down then its MAC address should
not be matchable on the NFP.

Implement a delete message for offloaded MACs and call this on net device
down. While at it, remove the actions on register and unregister netdev
events. A MAC should only be offloaded if the device is up. Note that the
netdev notifier will replay any notifications for UP devices on
registration so NFP can still offload ports that exist before the driver
is loaded. Similarly, devices need to go down before they can be
unregistered so removal of offloaded MACs is only required on down events.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

49402b0b

nfp: flower: remove list infastructure from MAC offload · 0115dcc3

John Hurley authored Jan 15, 2019

Potential MAC destination addresses for tunnel end-points are offloaded to
firmware. This was done by building a list of such MACs and writing to
firmware as blocks of addresses.

Simplify this code by removing the list format and sending a new message
for each offloaded MAC.

This is in preparation for delete MAC messages. There will be one delete
flag per message so we cannot assume that this applies to all addresses
in a list.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0115dcc3

nfp: flower: ignore offload of VF and PF repr MAC addresses · 41da0b5e

John Hurley authored Jan 15, 2019

Currently MAC addresses of all repr netdevs, along with selected non-NFP
controlled netdevs, are offloaded to FW as potential tunnel end-points.
However, the addresses of VF and PF reprs are meaningless outside of
internal communication and it is only those of physical port reprs
required.

Modify the MAC address offload selection code to ignore VF/PF repr devs.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41da0b5e