Commits · 0fbe82e628c817e292ff588cd5847fc935e025f2 · nexedi / linux

08 Dec, 2018 2 commits

net: call sk_dst_reset when set SO_DONTROUTE · 0fbe82e6

yupeng authored Dec 05, 2018

after set SO_DONTROUTE to 1, the IP layer should not route packets if
the dest IP address is not in link scope. But if the socket has cached
the dst_entry, such packets would be routed until the sk_dst_cache
expires. So we should clean the sk_dst_cache when a user set
SO_DONTROUTE option. Below are server/client python scripts which
could reprodue this issue:

server side code:

==========================================================================
import socket
import struct
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('0.0.0.0', 9000))
s.listen(1)
sock, addr = s.accept()
sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1))
while True:
    sock.send(b'foo')
    time.sleep(1)
==========================================================================

client side code:
==========================================================================
import socket
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('server_address', 9000))
while True:
    data = s.recv(1024)
    print(data)
==========================================================================
Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fbe82e6

neighbor: Improve garbage collection · 58956317

David Ahern authored Dec 07, 2018

The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58956317

07 Dec, 2018 33 commits

Merge branch 'hns3-error-handling' · 12edfdfc

David S. Miller authored Dec 07, 2018

Salil Mehta says:

====================
net: hns3: Additions/optimizations related to HNS3 H/W err handling

This patch set primarily does following addtions and optimizations
related to error handling in HNS3 Ethernet driver:

 1. Name changes for enable and process functions and minor loop
    optimizations. [PATCH 1-6]
 2. Modify query and clearing of RAS errors using new set of commands
    because modules specific commands for clearing RCB PPP PF, SSU are
    obselete. [PATCH 7]
 3. Deletes logging 1-bit errors for RAS in HNS3 driver as these never
    get reported to the driver. [PATCH 8]
 4. Add handling of NIC hw errors reported through MSIx rather than
    PCIe AER channel. [PATCH 9]
 5. Add handling for the HW RAS and MSIx errors in the modules MAC, PPP
    PF, MSIx SRAM, RCB and SSU. [PATCH 10-13]
 6. Add handling of RoCEE RAS errors. [PATCH 14]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

12edfdfc

net: hns3: add handling of RDMA RAS errors · 630ba007

Shiju Jose authored Dec 07, 2018

This patch handles the RDMA RAS errors.
1. Enable RAS interrupt, print error detail info and clear error status.
2. Do CORE reset to recovery when these non-fatal errors happened.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

630ba007

net: hns3: handle hw errors of SSU · c3529177

Shiju Jose authored Dec 07, 2018

This patch enables and handles hw errors of the Storage Switch Unit(SSU).
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3529177

net: hns3: handle hw errors of PPU(RCB) · f69b10b3

Shiju Jose authored Dec 07, 2018

This patch enables and handles hw RAS and MSIx errors of PPU(RCB).
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f69b10b3

net: hns3: handle hw errors of PPP PF · 8fc9d3e3

Shiju Jose authored Dec 07, 2018

This patch handles PF hw errors of PPP(Programmable Packet Processor).
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fc9d3e3

net: hns3: add handling of hw errors of MAC · 7838f908

Shiju Jose authored Dec 07, 2018

This patch adds enable and handling of hw errors of
the MAC block.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7838f908

net: hns3: add handling of hw errors reported through MSIX · f6162d44

Salil Mehta authored Dec 07, 2018

This patch adds handling for HNS3 hardware errors(non-standard)
which are reported through MSIX interrupts and not through
PCIe AER channel.

These MSIX reported hardware errors are handled using common
misc. interrupt handler. Hardware error related registers
cannot be cleared in context to the interrupt received as
they require *heavy* access to hardware using IMP(Integrated
Mangement Processor) commands. Hence, we defer the clearing
of such error events till later time.

Since, we have defered exact identification of errors we
will have to defer the level of receovery/reset which
might be required. Hence, a new reset type UNKNOWN reset
has been introduced which effectively defers the assertion
of the reset till we get hold of kind of errors at later
time.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6162d44

net: hns3: deleted logging 1 bit errors · 8bb14792

Shiju Jose authored Dec 07, 2018

This patch deletes logging 1 bit errors for the following reasons.
1. AER does not notify 1 bit errors to the device drivers.
   However AER reports 1 bit errors to the userspace through the
   trace_aer_event for logging in the rasdaemon.
2. Firmware clears the status of 1 bit errors in the hw registers.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8bb14792

net: hns3: add handling of hw ras errors using new set of commands · 332fbf57

Shiju Jose authored Dec 07, 2018

1. This patch adds handling of hw ras errors using new set of
   common commands.
2. Updated the error message tables to match the register's name and
   error status returned by the commands.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

332fbf57

net: hns3: add optimization in the hclge_hw_error_set_state · 481a626a

Shiju Jose authored Dec 07, 2018

1. This patch adds minor loop optimization in the
   hclge_hw_error_set_state function.
2. Adds logging module's name if it fails to configure the
   error interrupts.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

481a626a

net: hns3: rename process_hw_error function · 381c356e

Shiju Jose authored Dec 07, 2018

This patch renames process_hw_error function to
handle_hw_ras_error function to match the purpose
of the function. This is because hw errors reported through
ras and msix interrupts will be handled separately.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

381c356e

net: hns3: deletes unnecessary settings of the descriptor data · 166b04c3

Shiju Jose authored Dec 07, 2018

This patch deletes unnecessary setting of the descriptor data
to 0 for disabling error interrupts because
it is already done by the hclge_cmd_setup_basic_desc function.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

166b04c3

net: hns3: re-enable error interrupts on hw reset · f3fa4a94

Shiju Jose authored Dec 07, 2018

This patch adds calling hclge_hw_error_set_state function
to re-enable the error interrupts those will be disabled on
the hw reset.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3fa4a94

net: hns3: rename enable error interrupt functions · 98da4027

Shiju Jose authored Dec 07, 2018

This patch
- renames the enable error interrupt functions.
  The reason is that these functions
  are used for both enable and disable error interrupts.

- removes redundant logs from the enable error interrupt functions.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98da4027

net: hns3: remove existing process error functions and reorder hw_blk table · fe0f7d69

Shiju Jose authored Dec 07, 2018

1.The command interface for queryng and clearing hw errors is
  changed, which requires the new process error functions to be added.
  This patch removes all the current process error functions and
  associated definitions. The new functions to handle ras errors
  would be added in this patch set.

2. Fixed order issue of the hw_blk table.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe0f7d69

Merge branch 'mlxsw-Un-offload-FDB-on-NVE-detach-attach' · 9f4c2cff

David S. Miller authored Dec 07, 2018

Ido Schimmel says:

====================
mlxsw: Un/offload FDB on NVE detach/attach

Petr says:

When a VXLAN device is attached to a bridge of a driver capable of
offloading such, or upped, the FDB entries already present at the device
need to be offloaded. Similarly when an offloaded VXLAN device ceases
being interesting (it is downed, or detached, or a front-panel port
netdevice is detached from the bridge that the VXLAN device is attached
to), any offloaded FDB entries need to be unoffloaded and unmarked. This
attach / detach processing is implemented in this patchset.

In patch #1, a code pattern is extracted into a named function for
easier reuse.

In patch #2, vxlan_fdb_replay() is added to send
SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE for each FDB entry with a given VNI.
The intention is that the offloading driver will interpret these events
like any other and thus offload the FDB entries that existed prior to
VXLAN attach.

In patches #3 and #4, the functions vxlan_fdb_clear_offload() resp.
br_fdb_clear_offload() are added. These clear the offloaded flag at
matching FDB entries.

In patches #5-#9, we introduce FID-type-specific and NVE-type-specific
ops necessary to properly abstract invocations of the replay/clear
functions.

Finally patch #10 implements the FDB management.

In patch #11, the mlxsw-specific test case is extended to check that the
management of offload marks under the newly-supported situations is
correct. Patch #12, from Ido, exercises the new code paths in actual
functional test.

v2:
- Patch #1:
    - Modify vxlan_fdb_switchdev_notifier_info() to initialize the
      structure through a passed-in pointer argument, instead of returning
      it as a value.
- Patch #2:
    - Adapt to API change in vxlan_fdb_switchdev_notifier_info()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9f4c2cff

selftests: forwarding: Add PVID test case for VXLAN with VLAN-aware bridges · 55939b26

Ido Schimmel authored Dec 07, 2018

When using VLAN-aware bridges with VXLAN, the VLAN that is mapped to the
VNI of the VXLAN device is that which is configured as "pvid untagged"
on the corresponding bridge port.

When these flags are toggled or when the VLAN is deleted entirely,
remote hosts should not be able to receive packets from the VTEP.

Add a test case for above mentioned scenarios.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55939b26

selftests: mlxsw: vxlan: Test FDB un/marking on VXLAN join/leave · 0efe9ed9

Petr Machata authored Dec 07, 2018

When a VXLAN device is attached to an offloaded bridge, or when a
front-panel port is attached to a bridge that already has a VXLAN
device, mlxsw should offload the existing offloadable FDB entries.
Similarly when VXLAN device is downed, the FDB entries are unoffloaded,
and the marks thus need to be cleared. Similarly when a front-panel port
device is attached to a bridge with a VXLAN device, or when VLAN flags
are tweaked on a VXLAN port attached to a VLAN-aware bridge.

Test that the replaying / clearing logic works by observing transitions
in presence of offload marks under different scenarios.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0efe9ed9

mlxsw: spectrum_nve: Un/offload FDB on nve_fid_disable/enable · 8a5969d8

Petr Machata authored Dec 07, 2018

Any existing NVE FDB entries need to be offloaded when NVE is enabled
for a given FID. Recent patches have added fdb_replay op for this, so
just invoke it from mlxsw_sp_nve_fid_enable().

When NVE is disabled on a FID, any existing FDB offloaded marks need to
be cleared on NVE device as well as on its bridge master. An op to
handle this, fdb_clear_offload, has been added to FID ops and NVE ops in
previous patches. Add code to resolve the NVE device, NVE type, and
dispatch to both fdb_clear_offload ops.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a5969d8

mlxsw: spectrum: Add mlxsw_sp_fid_ops.fdb_clear_offload · 83de7883

Petr Machata authored Dec 07, 2018

If there are any offloaded FDB entries at bridge master of an NVE device
at the time that it's un-offloaded, their offloaded marks need to be
cleared. How that is done depends on whether the bridge in question is
vlan aware. Therefore add a per-FID-type operation.

Implement the operation for the 802.1q and 802.1d bridges.

Add and publish a function mlxsw_sp_fid_fdb_clear_offload() to dispatch
to the new operation according to FID type.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83de7883

mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_clear_offload · b73ef0e0

Petr Machata authored Dec 07, 2018

If there are any offloaded FDB entries at an NVE device at the time that
it's un-offloaded, their offloaded marks need to be cleared. How that is
done depends on NVE device type, and therefore add a per-NVE-type
operation.

Implement the operation for the sole NVE device type currently supported
by mlxsw, VXLAN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b73ef0e0

mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_replay · a6ef5a48

Petr Machata authored Dec 07, 2018

A replay of FDB needs to be performed so that the FDB entries existing
at the NVE device are offloaded. How the replay is done depends on NVE
device type, and therefore add a per-NVE-type operation.

Implement the operation for the sole NVE device type currently supported
by mlxsw, VXLAN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6ef5a48

mlxsw: spectrum_switchdev: Publish mlxsw_sp_switchdev_notifier · 34139ede

Petr Machata authored Dec 07, 2018

The notifier block will need to be passed to vxlan_fdb_replay() in a
follow-up patch.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34139ede

mlxsw: spectrum: Track NVE type at FIDs · 2a36c125

Petr Machata authored Dec 07, 2018

A follow-up patch will add support for replay and for clearing of
offload marks. These are NVE type-sensitive operations, and to be able
to dispatch them properly, a FID needs to know what NVE type is attached
to it.

Therefore, track the NVE type at struct mlxsw_sp_fid. Extend
mlxsw_sp_fid_vni_set() to take it as an argument, and add
mlxsw_sp_fid_nve_type().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a36c125

bridge: Add br_fdb_clear_offload() · 43920edf

Petr Machata authored Dec 07, 2018

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that unsets the
offload flag on FDB entries on a given bridge port and VLAN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

43920edf

vxlan: Add vxlan_fdb_clear_offload() · e5ff4b19

Petr Machata authored Dec 07, 2018

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that walks the
FDB table, unsetting the offload flag on RDST with a given VNI.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5ff4b19

vxlan: Add vxlan_fdb_replay() · 4f89f5b5

Petr Machata authored Dec 07, 2018

When a VXLAN device becomes relevant to a driver (such as when it is
attached to an offloaded bridge), the driver will generally need to walk
the existing FDB entries and offload them.

Add a function vxlan_fdb_replay() to call a given notifier block for
each FDB entry with a given VNI.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f89f5b5

vxlan: Add a function to init switchdev_notifier_vxlan_fdb_info · ff23b91c

Petr Machata authored Dec 07, 2018

There are currently two places that need to initialize the notifier info
structure, and one more is coming next when vxlan_fdb_replay() is
introduced. These three instances have / will have very similar code
that is easy to abstract away into a named function.

Add such function, vxlan_fdb_switchdev_notifier_info(), and call it from
vxlan_fdb_switchdev_call_notifiers() and vxlan_fdb_find_uc().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff23b91c

Merge branch 'net-aquantia-add-RSS-configuration' · 6b241e41

David S. Miller authored Dec 07, 2018

Igor Russkikh says:

====================
net: aquantia: add RSS configuration

In this patchset few bugs related to RSS are fixed and RSS table and
hash key configuration is added.

We also do increase max number of HW rings upto 8.

v2: removed extra arg check
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6b241e41

net: aquantia: add support of RSS configuration · 39163767

Dmitry Bogdanov authored Dec 07, 2018

Add support of configuration of RSS hash key and RSS indirection table.
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

39163767

net: aquantia: fix initialization of RSS table · a8c69ca7

Dmitry Bogdanov authored Dec 07, 2018

Now RSS indirection table is initialized before setting up the number of
hw queues, consequently the table may be filled by non existing queues.
This patch moves the initialization when the number of hw queues is
known.
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8c69ca7

net: aquantia: increase max number of hw queues · 71a963cf

Dmitry Bogdanov authored Dec 07, 2018

Increase the upper limit of the hw queues up to 8.
This makes RSS better on multiheaded cpus.

This is a maximum AQC hardware supports in one traffic class.

The actual value is still limited by a number of available cpu cores.
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71a963cf

net: aquantia: fix RSS table and key sizes · 474fb115

Dmitry Bogdanov authored Dec 07, 2018

Set RSS indirection table and RSS hash key sizes to their real size.
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

474fb115

06 Dec, 2018 5 commits

Merge branch 'Pass-extack-to-NETDEV_PRE_UP' · ef2df7fc

David S. Miller authored Dec 06, 2018

Petr Machata says:

====================
Pass extack to NETDEV_PRE_UP

Drivers may need to validate configuration of a device that's about to
be upped. An example is mlxsw, which needs to check the configuration of
a VXLAN device attached to an offloaded bridge. Should the validation
fail, there's currently no way to communicate details of the failure to
the user, beyond an error number.

Therefore this patch set extends the NETDEV_PRE_UP event to include
extack, if available.

There are three vectors through which NETDEV_PRE_UP invocation can be
reached. The two major ones are dev_open() and dev_change_flags(), the
last is then __dev_change_flags().

In patch #1, the first access vector, dev_open() is addressed. An extack
parameter is added and all users converted to use it.

Before addressing the second vector, two preparatory patches propagate
extack argument to the proximity of the dev_change_flags() call in VRF
and IPVLAN drivers. That happens in patches #2 and #3. Then in patch #4,
dev_change_flags() is treated similarly to dev_open().

Likewise in patch #5, __dev_change_flags() is extended.

Then in patches #6 and #7, the extack is finally propagated all the way
to the point where the notification is emitted.

This change allows particularly mlxsw (which already has code to
leverage extack if available) to communicate to the user error messages
regarding VXLAN configuration. In patch #8, add a test case that
exercises this code and checks that an error message is propagated.

For example:

	local 192.0.2.17 remote 192.0.2.18 \
	dstport 4789 nolearning noudpcsum tos inherit ttl 100
	local 192.0.2.17 remote 192.0.2.18 \
	dstport 4789 nolearning noudpcsum tos inherit ttl 100
Error: mlxsw_spectrum: Conflicting NVE tunnels configuration.

v2:
- Add David Ahern's tags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ef2df7fc

selftests: mlxsw: Add a new test extack.sh · 1ba1daed

Petr Machata authored Dec 06, 2018

Add a testsuite dedicated to testing extack propagation and related
functionality.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ba1daed

net: core: dev: Attach extack to NETDEV_PRE_UP · 40c900aa

Petr Machata authored Dec 06, 2018

Drivers may need to validate configuration of a device that's about to
be upped. Should the validation fail, there's currently no way to
communicate details of the failure to the user, beyond an error number.

To mend that, change __dev_open() to take an extack argument and pass it
from __dev_change_flags() and dev_open(), where it was propagated in the
previous patches.

Change __dev_open() to call call_netdevice_notifiers_extack() so that
the passed-in extack is attached to the NETDEV_PRE_UP notifier.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40c900aa

net: core: dev: Add call_netdevice_notifiers_extack() · 26372605

Petr Machata authored Dec 06, 2018

In order to propagate extack through NETDEV_PRE_UP, add a new function
call_netdevice_notifiers_extack() that primes the extack field of the
notifier info. Convert call_netdevice_notifiers() to a simple wrapper
around the new function that passes NULL for extack.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26372605

net: core: dev: Add extack argument to __dev_change_flags() · 6d040321

Petr Machata authored Dec 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. The last missing API is __dev_change_flags().

Therefore extend __dev_change_flags() with and extra extack argument and
update the two existing users.

Since the function declaration line is changed anyway, name the struct
net_device argument to placate checkpatch.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d040321