Commits · fb0d9c6339e0d8d1eb09414a09df8513499bbf79 · Kirill Smelkov / linux

04 Dec, 2018 21 commits

net: stmmac: convert to DEFINE_SHOW_ATTRIBUTE · fb0d9c63

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb0d9c63

nfp: convert to DEFINE_SHOW_ATTRIBUTE · 6f6c74fa

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f6c74fa

Merge branch 'mlx4_core-cleanups' · 76eb6ea4

David S. Miller authored Dec 03, 2018

Tariq Toukan says:

====================
mlx4_core cleanups

This patchset by Erez contains cleanups to the mlx4_core driver.

Patch 1 replaces -EINVAL with -EOPNOTSUPP for unsupported operations.
Patch 2 fixes some coding style issues.

Series generated against net-next commit:
97e6c858 net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspend
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

76eb6ea4

net/mlx4_core: Fix several coding style errors · 92a59ad0

Erez Alfasi authored Dec 02, 2018

Fix 3 checkpatch errors within mlx4/main.c:
- Unnecessary mlx4_debug_level global variable initialization to 0.
- Prohibited space before comma.
- Whitespaces instead of tab.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92a59ad0

net/mlx4_core: Fix return codes of unsupported operations · 95aac2cd

Erez Alfasi authored Dec 02, 2018

Functions __set_port_type and mlx4_check_port_params returned
-EINVAL while the proper return code is -EOPNOTSUPP as a
result of an unsupported operation. All drivers should generate
this and all users should check for it when detecting an
unsupported functionality.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95aac2cd

net: phy: Also request modules for C45 IDs · 30fcd6a9

Jose Abreu authored Dec 02, 2018

Logic of phy_device_create() requests PHY modules according to PHY ID
but for C45 PHYs we use different field for the IDs.

Let's also request the modules for these IDs.

Changes from v1:
- Only request C22 modules if C45 are not present (Andrew)
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Joao Pinto <joao.pinto@synopsys.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30fcd6a9

Merge branch 'octeontx2-next' · 3eaf3ca6

David S. Miller authored Dec 03, 2018

Jerin Jacob says:

====================
octeontx2-af: NIX and NPC enhancements

This patchset is a continuation to earlier submitted four patch
series to add a new driver for Marvell's OcteonTX2 SOC's
Resource virtualization unit (RVU) admin function driver.

1. octeontx2-af: Add RVU Admin Function driver
   https://www.spinics.net/lists/netdev/msg528272.html
2. octeontx2-af: NPA and NIX blocks initialization
   https://www.spinics.net/lists/netdev/msg529163.html
3. octeontx2-af: NPC parser and NIX blocks initialization
   https://www.spinics.net/lists/netdev/msg530252.html
4. octeontx2-af: NPC MCAM support and FLR handling
   https://www.spinics.net/lists/netdev/msg534392.html

This patch series adds support for below

NPC block:
- Add NPC(mkex) profile support for various Key extraction configurations

NIX block:
- Enable dynamic RSS flow key algorithm configuration
- Enhancements on Rx checksum and error checks
- Add support for Tx packet marking support
- TL1 schedule queue allocation enhancements
- Add LSO format configuration mbox
- VLAN TPID configuration
- Skip multicast entry init for broadcast tables

v2:

- Rename FLOW_* to NIX_FLOW_* to avoid serious global namespace collisions,
as we have various FLOW_* definitions coming from
include/uapi/linux/pkt_cls.h for example.(David Miller)
- Pack the arguments of rvu_get_tl1_schqs() function
as 80 columns allows.(David Miller)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3eaf3ca6

octeontx2-af: Enable mkex profile · 23705adb

Vamsi Attunuru authored Dec 02, 2018

The following set of NPC registers allow the driver to configure NPC
to generate different key value schemes to compare against packet
payload in MCAM search.

NPC_AF_INTF(0..1)_KEX_CFG
NPC_AF_KEX_LDATA(0..1)_FLAGS_CFG
NPC_AF_INTF(0..1)_LID(0..7)_LT(0..15)_LD(0..1)_CFG
NPC_AF_INTF(0..1)_LDATA(0..1)_FLAGS(0..15)_CFG

Currently, the AF driver populates these registers to
configure the default values to address the most common
use cases such as key generation for channel number + DMAC.

The secure firmware stores different configuration
value of these registers to enable different NPC use case
along with the name for the lookup.

Patch loads profile binary from secure firmware over
the exiting CGX mailbox interface and apply the profile.

AF driver shall fall back to the default configuration
in case of any errors.

The AF consumer driver can know the selected profile
on response to NPC_GET_KEX_CFG mailbox by introducing
mkex_pfl_name in the struct npc_get_kex_cfg_rsp.
Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23705adb

octeontx2-af: Add LSO format configuration mailbox · da5d32e1

Nithin Dabilpuram authored Dec 02, 2018

NIX_AF_LSO_FORMAT(0..31)_FIELD(0..7) register enables an SW defined
means to define LSO packet modification formats.

0..31 works as an index to choose the algorithm, On success, the mailbox
returns the index to the client of chosen LSO algorithm selection.
This index will be used in configuring the transmit descriptors.

Add mailbox interface to dynamically reserve and configure LSO format.

This commit also fixes 'sizem1' for NIX_LSOALG_TCP_FLAGS
to '1' i.e 2 Bytes.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

da5d32e1

octeontx2-af: Add L3 and L4 packet verification mailbox · 159a8a67

Vidhya Raman authored Dec 02, 2018

Adds mailbox support for L4 checksum verification
and L3 and L4 length verification configuration.
Signed-off-by: Vidhya Raman <vraman@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

159a8a67

octeontx2-af: Configure VLAN TPIDs · a253933e

Nithin Dabilpuram authored Dec 02, 2018

Setup TPID's for vlan0 and vlan1 for Tx VLAN insertion offloads.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a253933e

octeontx2-af: Add support for Tx packet marking · a27d7659

Krzysztof Kanas authored Dec 02, 2018

NIX_AF_MARK_FORMAT(0..127)_CTL register enables an SW defined
means to mark/insert various data in the packet based on
final packet color from traffic shaping HW.

0..127 works as an index to choose the algorithm. On success,
the mailbox returns the index to the client.

Add NIX_MARK_FORMAT_CFG mailbox which reserves mark format based on
tuple (offset, y_mask, y_val, r_mask, r_val)

If the tuple is requested again for mark format that was already
reserved, then it will be reused. If not it will reserve a new entry
if space is available.

Also on AF init commonly used marker format such as VLAN DEI, IPv4
ECN, IPv4 DSCP are reserved for AF consumers.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a27d7659

octeontx2-af: Enable RSS with promiscuous mode · f9f2da46

Vamsi Attunuru authored Dec 02, 2018

This patch adds support for enabling RSS in promiscuous mode
if RSS is already requested by the AF client.
Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9f2da46

octeontx2-af: Define all NIX_AF_RX_DEF_* registers · 7c91a92e

Jerin Jacob authored Dec 02, 2018

In order to support all NIX specific valid length errors and
checksum errors on Rx, Update all NIX_AF_RX_DEF_* registers.

Also sorted all registers in HRM definition order.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c91a92e

octeontx2-af: Enable inner IPv4 checksum and its error code · 962e1bd6

Jerin Jacob authored Dec 02, 2018

This patch enables the inner IPv4 checksum and
defines the error code for Rx inner and outer checksum errors.
Setting ERRCODE as 1 so that CQE descriptor can be embedded
valid checksum error code and the driver can interpret
checksum error as ERRLEV = LID + 1 and ERRCODE = 1.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

962e1bd6

octeontx2-af: Allow freeing single TLx Tx schedule queue · e2703c5f

Nithin Dabilpuram authored Dec 02, 2018

The default behavior was to free all the TLx Tx schedule
queues. This patch adds support for freeing a single Tx
schedule queue if TXSCHQ_FREE_ALL flag is not set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2703c5f

octeontx2-af: Restrict TL1 allocation and configuration · 26dda7da

Nithin Dabilpuram authored Dec 02, 2018

TL1 is the root node in the scheduling hierarchy and
it is a global resource with a limited number.

This patch introduces restriction and validation on
the allocation of the TL1 nodes for the effective resource
sharing across the AF consumers.

- Limit TL1 allocation to 2 per lmac.
  One could be for the normal link and one for IEEE802.3br
  express link (Express Send DMA).
  Effectively all the VF's of an RVU PF(lmac) share the two TL1 schqs.
- TL1 cannot be freed once allocated.
- Allow VF's to only apply default config to TL1 if not
  already applied. PF's can always overwrite the TL1 config.
- Consider NIX_AQ_INSTOP_WRITE while validating txschq
  when sq.ena is set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26dda7da

octeontx2-af: Add support for runtime RSS algo index reservation · 7ee74697

Jerin Jacob authored Dec 02, 2018

Introduced reserve_flowkey_alg_idx()to reserve RSS algorithm index,
it would internally use set_flowkey_fields() to generate fields
based on the flow key dynamically.

On AF driver init, it would reserve a predefined set RSS algo indexes,
which will be available all the time for all the AF driver consumers.
The leftover algo indexes can be reserved at runtime through
exiting nix_rss_flowkey_cfg mailbox message.

The NIX_FLOW_KEY_TYPE_PORT is removed from predefined a set of RSS flow
type as it is not used by any consumer.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ee74697

octeontx2-af: Add support for dynamic flow cfg to RSS field generation · b648366c

Jerin Jacob authored Dec 02, 2018

Introduce state-based algorithm to convert the flow_key value
to RSS algo field used by NIX_AF_RX_FLOW_KEY_ALGX_FIELDX register.

The outer `for loop` goes over _all_ protocol field and the following
variables depict the state machine forward progress logic.

a) keyoff_marker - Enabled when hash byte length needs to be accounted
in field->key_offset update.
b) field_marker - Enabled when a new field needs to be selected.
c) group_member - Enabled when a protocol is part of a group.

This would remove the existing hard coding and enable to add
new protocol support seamlessly.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b648366c

octeontx2-af: Add response for RSS flow key cfg message · bd522d68

Jerin Jacob authored Dec 02, 2018

Added response for nix_rss_flowkey_cfg message to return
selected RSS algorithm index.

The FLOW_KEY_TYPE* definition is part of the mbox message and
it will be used by the other consumers of AF driver hence moving to mbox.h.

Also renamed FLOW_* definitions to NIX_FLOW_* to avoid global
name space collisions, as we have various coming from
include/uapi/linux/pkt_cls.h for example.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd522d68

octeontx2-af: Skip NIXLF check for bcast MCE entry · c5e4e4d1

Sunil Goutham authored Dec 02, 2018

At the time of initial broadcast packet replication table init,
NIXLFs are not yet attached to PF_FUNCs. Hence skipped checking
NIXLF while submitting MCE entry init instruction to NIX admin queue.

Also did a minor cleanup while installing bcast match entry in
packet parser unit i.e NPC.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5e4e4d1

03 Dec, 2018 19 commits

Merge branch 'udp-msg_zerocopy' · 6e360f73

David S. Miller authored Dec 03, 2018

Willem de Bruijn says:

====================
udp msg_zerocopy

Enable MSG_ZEROCOPY for udp sockets

Patch 1/3 is the main patch, a rework of RFC patch
  http://patchwork.ozlabs.org/patch/899630/
  more details in the patch commit message

Patch 2/3 is an optimization to remove a branch from the UDP hot path
  and refcount_inc/refcount_dec_and_test pair when zerocopy is used.
  This used to be included in the first patch in v2.

Patch 3/3 runs the already existing udp zerocopy tests
  as part of kselftest

See also recent Linux Plumbers presentation
  https://linuxplumbersconf.org/event/2/contributions/106/attachments/104/128/willemdebruijn-lpc2018-udpgso-presentation-20181113.pdf

Changes:
  v1 -> v2
    - Fixup reverse christmas tree violation
  v2 -> v3
    - Split refcount avoidance optimization into separate patch
      - Fix refcount leak on error in fragmented case
        (thanks to Paolo Abeni for pointing this one out!)
      - Fix refcount inc on zero
  v3 -> v4
    - Move skb_zcopy_set below the only kfree_skb that might cause
      a premature uarg destroy before skb_zerocopy_put_abort
      - Move the entire skb_shinfo assignment block, to keep that
	cacheline access in one place
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6e360f73

selftests: extend zerocopy tests to udp · db63e489

Willem de Bruijn authored Nov 30, 2018

Both msg_zerocopy and udpgso_bench have udp zerocopy variants.
Exercise these as part of the standard kselftest run.

With udp, msg_zerocopy has no control channel. Ensure that the
receiver exits after the sender by accounting for the initial
delay in starting them (in msg_zerocopy.sh).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db63e489

udp: elide zerocopy operation in hot path · 52900d22

Willem de Bruijn authored Nov 30, 2018

With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
Release of its last reference triggers a completion notification.

The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
the skbs, because it can build, send and free skbs within its loop,
possibly reaching refcount zero and freeing the ubuf_info too soon.

The UDP stack currently also takes this extra ref, but does not need
it as all skbs are sent after return from __ip(6)_append_data.

Avoid the extra refcount_inc and refcount_dec_and_test, and generally
the sock_zerocopy_put in the common path, by passing the initial
reference to the first skb.

This approach is taken instead of initializing the refcount to 0, as
that would generate error "refcount_t: increment on 0" on the
next skb_zcopy_set.

Changes
  v3 -> v4
    - Move skb_zcopy_set below the only kfree_skb that might cause
      a premature uarg destroy before skb_zerocopy_put_abort
      - Move the entire skb_shinfo assignment block, to keep that
        cacheline access in one place
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52900d22

udp: msg_zerocopy · b5947e5d

Willem de Bruijn authored Nov 30, 2018

Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
interpret flag MSG_ZEROCOPY.

This patch was previously part of the zerocopy RFC patchsets. Zerocopy
is not effective at small MTU. With segmentation offload building
larger datagrams, the benefit of page flipping outweights the cost of
generating a completion notification.

tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
test patch and making skb_orphan_frags_rx same as skb_orphan_frags:

    ipv4 udp -t 1
    tx=191312 (11938 MB) txc=0 zc=n
    rx=191312 (11938 MB)
    ipv4 udp -z -t 1
    tx=304507 (19002 MB) txc=304507 zc=y
    rx=304507 (19002 MB)
    ok
    ipv6 udp -t 1
    tx=174485 (10888 MB) txc=0 zc=n
    rx=174485 (10888 MB)
    ipv6 udp -z -t 1
    tx=294801 (18396 MB) txc=294801 zc=y
    rx=294801 (18396 MB)
    ok

Changes
  v1 -> v2
    - Fixup reverse christmas tree violation
  v2 -> v3
    - Split refcount avoidance optimization into separate patch
      - Fix refcount leak on error in fragmented case
        (thanks to Paolo Abeni for pointing this one out!)
      - Fix refcount inc on zero
      - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
        This is needed since commit 5cf4a853 ("tcp: really ignore
	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5947e5d

Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of... · ce01a56b

David S. Miller authored Dec 03, 2018

Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.21

First set of patches for 4.21. Most notable here is support for
Quantenna's QSR1000/QSR2000 chipsets and more flexible ways to provide
nvram files for brcmfmac.

Major changes:

brcmfmac

* add support for first trying to get a board specific nvram file

* add support for getting nvram contents from EFI variables

qtnfmac

* use single PCIe driver for all platforms and rename
  Kconfig option CONFIG_QTNFMAC_PEARL_PCIE to CONFIG_QTNFMAC_PCIE

* add support for QSR1000/QSR2000 (Topaz) family of chipsets

ath10k

* add support for WCN3990 firmware crash recovery

* add firmware memory dump support for QCA4019

wil6210

* add firmware error recovery while in AP mode

ath9k

* remove experimental notice from dynack feature

iwlwifi

* PCI IDs for some new 9000-series cards

* improve antenna usage on connection problems

* new firmware debugging infrastructure

* some more work on 802.11ax

* improve support for multiple RF modules with 22000 devices

cordic

* move cordic macros and defines to a public header file

* convert brcmsmac and b43 to fully use cordic library
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce01a56b

Merge branch 'davinci_emac-read-the-MAC-address-from-nvmem' · 37a0bc39

David S. Miller authored Dec 03, 2018

Bartosz Golaszewski says:

====================
davinci_emac: read the MAC address from nvmem

This series is part of a bigger series that aims at removing the platform
data structure from the at24 EEPROM driver[1].

We provide a generalized version of of_get_nvmem_mac_address(), switch the
only user of the of_ variant to using it, remove the previous
implementation and use the new routine in the davinci_emac driver.

[1] https://lkml.org/lkml/2018/11/13/884
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

37a0bc39

net: davinci_emac: use nvmem_get_mac_address() · 18dbfc81

Bartosz Golaszewski authored Nov 30, 2018

All DaVinci boards still supported in board files now define nvmem
cells containing the MAC address. We want to stop using the setup
callback from at24 so the MAC address for those users will no longer
be provided over platform data. If we didn't get a valid MAC in pdata,
try nvmem before resorting to a random MAC.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18dbfc81

of: net: kill of_get_nvmem_mac_address() · afa64a72

Bartosz Golaszewski authored Nov 30, 2018

We've switched all users to nvmem_get_mac_address(). Remove the now
dead code.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

afa64a72

net: cadence: switch to using nvmem_get_mac_address() · cce41b8f

Bartosz Golaszewski authored Nov 30, 2018

We now have a generalized helper routine to read the MAC address from
nvmem which takes struct device as argument. The nvmem subsystem will
then try device tree first before all other potential providers.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cce41b8f

net: ethernet: provide nvmem_get_mac_address() · 0e839df9

Bartosz Golaszewski authored Nov 30, 2018

We already have of_get_nvmem_mac_address() but some non-DT systems want
to read the MAC address from NVMEM too. Implement a generalized routine
that takes struct device as argument.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e839df9

rhashtable: detect when object movement between tables might have invalidated a lookup · 82208d0d

NeilBrown authored Nov 30, 2018

Some users of rhashtables might need to move an object from one table
to another -  this appears to be the reason for the incomplete usage
of NULLS markers.

To support these, we store a unique NULLS_MARKER at the end of
each chain, and when a search fails to find a match, we check
if the NULLS marker found was the expected one.  If not, the search
may not have examined all objects in the target bucket, so it is
repeated.

The unique NULLS_MARKER is derived from the address of the
head of the chain.  As this cannot be derived at load-time the
static rhnull in rht_bucket_nested() needs to be initialised
at run time.

Any caller of a lookup function must still be prepared for the
possibility that the object returned is in a different table - it
might have been there for some time.

Note that this does NOT provide support for other uses of
NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing
the key of an object and re-inserting it in the same table.
These could only be done safely if new objects were inserted
at the *start* of a hash chain, and that is not currently the case.
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

82208d0d

Merge branch 'hns3-ethtool-dump' · 77ac327c

David S. Miller authored Dec 03, 2018

Salil Mehta says:

====================
Adds VF/PF PCIe reg dump(ethtool -d) support to HNS3 driver

This patchset adds VF/PF PCIe register dump support to HNS3 VF and PF
driver using "ethtool -d" command.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

77ac327c

net: hns3: Adds support to dump(using ethool-d) PCIe regs in HNS3 PF driver · ea4750ca

Jian Shen authored Nov 29, 2018

This patch adds support to dump PF PCIe registers using ethtool -d
for HNS3 PF Driver.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea4750ca

net: hns3: Support "ethtool -d" for HNS3 VF driver · 1600c3e5

Jian Shen authored Nov 29, 2018

This patch adds "ethtool -d" support for HNS3 VF Driver.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1600c3e5

net: phy: improve generic EEE ethtool functions · d1420bb9

Heiner Kallweit authored Nov 27, 2018

So far the two functions consider neither member eee_enabled nor
eee_active. Therefore network drivers have to do this in some kind
of glue code. I think this can be avoided.

Getting EEE parameters:
When not advertising any EEE mode, we can't consider EEE to be enabled.
Therefore interpret "EEE enabled" as "we advertise at least one EEE
mode". It's similar with "EEE active": interpret it as "EEE modes
advertised by both link partner have at least one mode in common".

Setting EEE parameters:
If eee_enabled isn't set, don't advertise any EEE mode and restart
aneg if needed to switch off EEE. If eee_enabled is set and
data->advertised is empty (e.g. because EEE was disabled), advertise
everything we support as default. This way EEE can easily switched
on/off by doing ethtool --set-eee <if> eee on/off, w/o any additional
parameters.

The changes to both functions shouldn't break any existing user.
Once the changes have been applied, at least some users can be
simplified.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1420bb9

Merge branch 'VXLAN-underlay-VRF' · 79dfab43

David S. Miller authored Dec 03, 2018

Alexis Bauvin says:

====================
net: Add VRF support for VXLAN underlay

v6 -> v7:
- proper locking for device in udp_tunnel following Sabrina Dubroca's advice

v5 -> v6:
- remove automatic rebinding patch following Roopa Prabhu's advice

v4 -> v5:
- move test script to its own patch (6/6)
- add schematic for test script
- apply David Ahern comments to the test script

v3 -> v4:
- rename vxlan_is_in_l3mdev_chain to netdev_is_upper master
- move it to net/core/dev.c
- make it return bool instead of int
- check if remote_ifindex is zero before resolving the l3mdev
- add testing script

v2 -> v3:
- fix build when CONFIG_NET_IPV6 is off
- fix build "unused l3mdev_master_upper_ifindex_by_index" build error with some
  configs

v1 -> v2:
- move vxlan_get_l3mdev from vxlan driver to l3mdev driver as
  l3mdev_master_upper_ifindex_by_index
- vxlan: rename variables named l3mdev_ifindex to ifindex

v0 -> v1:
- fix typos

We are trying to isolate the VXLAN traffic from different VMs with VRF as shown
in the schemas below:

+-------------------------+   +----------------------------+
| +----------+            |   |     +------------+         |
| |          |            |   |     |            |         |
| | tap-red  |            |   |     |  tap-blue  |         |
| |          |            |   |     |            |         |
| +----+-----+            |   |     +-----+------+         |
|      |                  |   |           |                |
|      |                  |   |           |                |
| +----+---+              |   |      +----+----+           |
| |        |              |   |      |         |           |
| | br-red |              |   |      | br-blue |           |
| |        |              |   |      |         |           |
| +----+---+              |   |      +----+----+           |
|      |                  |   |           |                |
|      |                  |   |           |                |
|      |                  |   |           |                |
| +----+--------+         |   |     +--------------+       |
| |             |         |   |     |              |       |
| |  vxlan-red  |         |   |     |  vxlan-blue  |       |
| |             |         |   |     |              |       |
| +------+------+         |   |     +-------+------+       |
|        |                |   |             |              |
|        |           VRF  |   |             |          VRF |
|        |           red  |   |             |         blue |
+-------------------------+   +----------------------------+
         |                                  |
         |                                  |
 +---------------------------------------------------------+
 |       |                                  |              |
 |       |                                  |              |
 |       |         +--------------+         |              |
 |       |         |              |         |              |
 |       +---------+  eth0.2030   +---------+              |
 |                 |  10.0.0.1/24 |                        |
 |                 +-----+--------+                    VRF |
 |                       |                            green|
 +---------------------------------------------------------+
                         |
                         |
                    +----+---+
                    |        |
                    |  eth0  |
                    |        |
                    +--------+

iproute2 commands to reproduce the setup:

ip link add green type vrf table 1
ip link set green up
ip link add eth0.2030 link eth0 type vlan id 2030
ip link set eth0.2030 master green
ip addr add 10.0.0.1/24 dev eth0.2030
ip link set eth0.2030 up

ip link add blue type vrf table 2
ip link set blue up
ip link add br-blue type bridge
ip link set br-blue master blue
ip link set br-blue up
ip link add vxlan-blue type vxlan id 2 local 10.0.0.1 dev eth0.2030 \
 port 4789
ip link set vxlan-blue master br-blue
ip link set vxlan-blue up
ip link set tap-blue master br-blue
ip link set tap-blue up

ip link add red type vrf table 3
ip link set red up
ip link add br-red type bridge
ip link set br-red master red
ip link set br-red up
ip link add vxlan-red type vxlan id 3 local 10.0.0.1 dev eth0.2030 \
 port 4789
ip link set vxlan-red master br-red
ip link set vxlan-red up
ip link set tap-red master br-red
ip link set tap-red up

We faced some issue in the datapath, here are the details:

* Egress traffic:
The vxlan packets are sent directly to the default VRF because it's where the
socket is bound, therefore the traffic has a default route via eth0. the
workaround is to force this traffic to VRF green with ip rules.

* Ingress traffic:
When receiving the traffic on eth0.2030 the vxlan socket is unreachable from
VRF green. The workaround is to enable *udp_l3mdev_accept* sysctl, but
this breaks isolation between overlay and underlay: packets sent from
blue or red by e.g. a guest VM will be accepted by the socket, allowing
injection of VXLAN packets from the overlay.

This patch series fixes the issues describe above by allowing VXLAN socket to be
bound to a specific VRF device therefore looking up in the correct table.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

79dfab43

test/net: Add script for VXLAN underlay in a VRF · 03f1c26b

Alexis Bauvin authored Dec 03, 2018

This script tests the support of a VXLAN underlay in a non-default VRF.

It does so by simulating two hypervisors and two VMs, an extended L2
between the VMs with the hypervisors as VTEPs with the underlay in a
VRF, and finally by pinging the two VMs.

It also tests that moving the underlay from a VRF to another works when
down/up the VXLAN interface.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

03f1c26b

vxlan: add support for underlay in non-default VRF · aab8cc36

Alexis Bauvin authored Dec 03, 2018

Creating a VXLAN device with is underlay in the non-default VRF makes
egress route lookup fail or incorrect since it will resolve in the
default VRF, and ingress fail because the socket listens in the default
VRF.

This patch binds the underlying UDP tunnel socket to the l3mdev of the
lower device of the VXLAN device. This will listen in the proper VRF and
output traffic from said l3mdev, matching l3mdev routing rules and
looking up the correct routing table.

When the VXLAN device does not have a lower device, or the lower device
is in the default VRF, the socket will not be bound to any interface,
keeping the previous behaviour.

The underlay l3mdev is deduced from the VXLAN lower device
(IFLA_VXLAN_LINK).

+----------+                         +---------+
|          |                         |         |
| vrf-blue |                         | vrf-red |
|          |                         |         |
+----+-----+                         +----+----+
     |                                    |
     |                                    |
+----+-----+                         +----+----+
|          |                         |         |
| br-blue  |                         | br-red  |
|          |                         |         |
+----+-----+                         +---+-+---+
     |                                   | |
     |                             +-----+ +-----+
     |                             |             |
+----+-----+                +------+----+   +----+----+
|          |  lower device  |           |   |         |
|   eth0   | <- - - - - - - | vxlan-red |   | tap-red | (... more taps)
|          |                |           |   |         |
+----------+                +-----------+   +---------+
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aab8cc36

l3mdev: add function to retreive upper master · 6a6d6681

Alexis Bauvin authored Dec 03, 2018

Existing functions to retreive the l3mdev of a device did not walk the
master chain to find the upper master. This patch adds a function to
find the l3mdev, even indirect through e.g. a bridge:

+----------+
|          |
| vrf-blue |
|          |
+----+-----+
     |
     |
+----+-----+
|          |
| br-blue  |
|          |
+----+-----+
     |
     |
+----+-----+
|          |
|   eth0   |
|          |
+----------+

This will properly resolve the l3mdev of eth0 to vrf-blue.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a6d6681