Commits · f5836ca5e9867fa6ab88cadb9873af56d9ceb589 · Kirill Smelkov / linux

29 Aug, 2017 18 commits

xdp: separate xdp_redirect tracepoint in error case · f5836ca5

Jesper Dangaard Brouer authored Aug 29, 2017

There is a need to separate the xdp_redirect tracepoint into two
tracepoints, for separating the error case from the normal forward
case.

Due to the extreme speeds XDP is operating at, loading a tracepoint
have a measurable impact.  Single core XDP REDIRECT (ethtool tuned
rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple
bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps
(CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe)

The overhead of loading a bpf-based tracepoint can be calculated to
cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns).

Using perf record on the tracepoint event, with a non-matching --filter
expression, the overhead is much larger. Performance drops to 8.3 Mpps,
cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74))

Having a separate tracepoint for err cases, which should be less
frequent, allow running a continuous monitor for errors while not
affecting the redirect forward performance (this have also been
verified by measurements).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5836ca5

xdp: make xdp tracepoints report bpf prog id instead of prog_tag · b06337df

Jesper Dangaard Brouer authored Aug 29, 2017

Given previous patch expose the map_id, it seems natural to also
report the bpf prog id.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b06337df

xdp: tracepoint xdp_redirect also need a map argument · 8d3b778f

Jesper Dangaard Brouer authored Aug 29, 2017

To make sense of the map index, the tracepoint user also need to know
that map we are talking about.  Supply the map pointer but only expose
the map->id.

The 'to_index' is renamed 'to_ifindex'.  In the xdp_redirect_map case,
this is the result of the devmap lookup. The map lookup key is exposed
as map_index, which is needed to troubleshoot in case the lookup failed.
The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as
possible.

This also keeps the TP_STRUCT similar enough, that userspace can write
a monitor program, that doesn't need to care about whether
bpf_redirect or bpf_redirect_map were used.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d3b778f

xdp: remove redundant argument to trace_xdp_redirect · c31e5a48

Jesper Dangaard Brouer authored Aug 29, 2017

Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect
is redundant as it is only called in-case this action was specified.

Remove the argument, but keep "act" member of the tracepoint struct and
populate it with XDP_REDIRECT.  This makes it easier to write a common bpf_prog
processing events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c31e5a48

Merge tag 'rxrpc-next-20170829' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · d0fcece7

David S. Miller authored Aug 29, 2017

David Howells says:

====================
rxrpc: Miscellany

Here are a number of patches that make some changes/fixes and add a couple
of extensions to AF_RXRPC for kernel services to use.  The changes and
fixes are:

 (1) Use time64_t rather than u32 outside of protocol or
     UAPI-representative structures.

 (2) Use the correct time stamp when loading a key from an XDR-encoded
     Kerberos 5 key.

 (3) Fix IPv6 support.

 (4) Fix some places where the error code is being incorrectly made
     positive before returning.

 (5) Remove some white space.

And the extensions:

 (6) Add an end-of-Tx phase notification, thereby allowing kAFS to
     transition the state on its own call record at the correct point,
     rather than having to do it in advance and risk non-completion of the
     call in the wrong state.

 (7) Allow a kernel client call to be retried if it fails on a network
     error, thereby making it possible for kAFS to iterate over a number of
     IP addresses without having to reload the Tx queue and re-encrypt data
     each time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d0fcece7

Merge branch 'addrlabel-no-rtnl-locking' · 3d86e352

David S. Miller authored Aug 29, 2017

Florian Westphal says:

====================
addrlabel: don't use rtnl locking

addrlabel doesn't appear to require rtnl lock as the addrlabel
table uses a spinlock to serialize add/delete operations.

Also, entries are reference counted so it should be safe
to call the rtnl ops without the rtnl mutex.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3d86e352

addrlabel: add/delete/get can run without rtnl · a6f57028

Florian Westphal authored Aug 29, 2017

There appears to be no need to use rtnl, addrlabel entries are refcounted
and add/delete is serialized by the addrlabel table spinlock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6f57028

selftests: add addrlabel add/delete to rtnetlink.sh · 34504029
Florian Westphal authored Aug 29, 2017
```
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
34504029

staging: irda: update MAINTAINERS · 6c766db6

Greg Kroah-Hartman authored Aug 29, 2017

Now that the IRDA code has moved under drivers/staging/irda/, update the
MAINTAINERS file with the new location.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Signed-off-by: David S. Miller <davem@davemloft.net>

6c766db6

bnxt_en: add a dummy definition for bnxt_vf_rep_get_fid() · f143647a

Sathya Perla authored Aug 29, 2017

When bnxt VF-reps are not compiled in (CONFIG_BNXT_SRIOV is off)
bnxt_tc.c needs a dummy definition of the routine bnxt_vf_rep_get_fid().
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 2ae7408f ("bnxt_en: bnxt: add TC flower filter offload support")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f143647a

rxrpc: Allow failed client calls to be retried · c038a58c

David Howells authored Aug 29, 2017

Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1.  This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.

Two new functions are provided:

 (1) rxrpc_kernel_check_call() - This is used to find out the completion
     state of a call to guess whether it can be retried and whether it
     should be retried.

 (2) rxrpc_kernel_retry_call() - Disconnect the call from its current
     connection, reset the state and submit it as a new client call to a
     new address.  The new address need not match the previous address.

A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected.  msg_data_left() can be used to find out
how much data was packaged before the error occurred.
Signed-off-by: David Howells <dhowells@redhat.com>

c038a58c

rxrpc: Add notification of end-of-Tx phase · e833251a

David Howells authored Aug 29, 2017

Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
a notification that the AF_RXRPC call has transitioned out the Tx phase and
is now waiting for a reply or a final ACK.

This is called from AF_RXRPC with the call state lock held so the
notification is guaranteed to come before any reply is passed back.

Further, modify the AFS filesystem to make use of this so that we don't have
to change the afs_call state before sending the last bit of data.
Signed-off-by: David Howells <dhowells@redhat.com>

e833251a

rxrpc: Remove some excess whitespace · 3ec0efde

David Howells authored Aug 29, 2017

Remove indentation from some blank lines.
Signed-off-by: David Howells <dhowells@redhat.com>

3ec0efde

rxrpc: Don't negate call->error before returning it · bd2db2d2

David Howells authored Aug 29, 2017

call->error is stored as 0 or a negative error code.  Don't negate this
value (ie. make it positive) before returning it from a kernel function
(though it should still be negated before passing to userspace through a
control message).
Signed-off-by: David Howells <dhowells@redhat.com>

bd2db2d2

rxrpc: Fix IPv6 support · 7b674e39

David Howells authored Aug 29, 2017

Fix IPv6 support in AF_RXRPC in the following ways:

 (1) When extracting the address from a received IPv4 packet, if the local
     transport socket is open for IPv6 then fill out the sockaddr_rxrpc
     struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead
     of an AF_INET one.

 (2) When sending CHALLENGE or RESPONSE packets, the transport length needs
     to be set from the sockaddr_rxrpc::transport_len field rather than
     sizeof() on the IPv4 transport address.

 (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up
     the address correctly before searching for the affected peer.
Signed-off-by: David Howells <dhowells@redhat.com>

7b674e39

rxrpc: Use correct timestamp from Kerberos 5 ticket · 0a378585

David Howells authored Aug 29, 2017

When an XDR-encoded Kerberos 5 ticket is added as an rxrpc-type key, the
expiry time should be drawn from the k5 part of the token union (which was
what was filled in), rather than the kad part of the union.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>

0a378585

net: rxrpc: Replace time_t type with time64_t type · 10674a03

Baolin Wang authored Aug 29, 2017

Since the 'expiry' variable of 'struct key_preparsed_payload' has been
changed to 'time64_t' type, which is year 2038 safe on 32bits system.

In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type
when copying ticket expires time to 'prep->expiry', then this patch
introduces two helper functions to help convert 'u32' to 'time64_t'
type.

This patch also uses ktime_get_real_seconds() to get current time instead
of get_seconds() which is not year 2038 safe on 32bits system.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: David Howells <dhowells@redhat.com>

10674a03

hinic: don't build the module by default · c8488a8a

Vitaly Kuznetsov authored Aug 28, 2017

We probably don't want to enable code supporting particular hardware by
default e.g. when someone does 'make defconfig'. Other ethernet modules
don't do it.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8488a8a

28 Aug, 2017 22 commits

Merge branch 'bnxt_en-next' · acae4b48

David S. Miller authored Aug 28, 2017

Michael Chan says:

====================
bnxt_en: Updates.

Various changes including updated firmware interface, improved TX ring
allocation scheme, improved out-of-memory logic in NAPI loop, reduced
default rings on multi-port devices, new PCI IDs. Of particular note,

CPU affinity hints from Vasundhara Volam.

TC Flower eswitch support from Sathya Perla.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

acae4b48

bnxt_en: add code to query TC flower offload stats · d7bc7305

Sathya Perla authored Aug 28, 2017

This patch adds code to implement TC_CLSFLOWER_STATS TC-cmd and the
required FW code to query the stats from the HW.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7bc7305

bnxt_en: add TC flower offload flow_alloc/free FW cmds · db1d36a2

Sathya Perla authored Aug 28, 2017

This patch adds the hwrm_cfa_flow_alloc/free() routines
that are needed to issue the FW cmds needed for TC flower offload.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db1d36a2

bnxt_en: bnxt: add TC flower filter offload support · 2ae7408f

Sathya Perla authored Aug 28, 2017

This patch adds support for offloading TC based flow
rules and actions for the 'flower' classifier in the bnxt_en driver.
It includes logic to parse flow rules and actions received from the
TC subsystem, store them and issue the corresponding
hwrm_cfa_flow_alloc/free FW cmds. L2/IPv4/IPv6 flows and drop,
redir, vlan push/pop actions are supported in this patch.

In this patch the hwrm_cfa_flow_xxx routines are just stubs.
The code for these routines is introduced in the next patch for easier
review. Also, the code to query the TC/flower action stats will
be introduced in a subsequent patch.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ae7408f

bnxt_en: fix clearing devlink ptr from bnxt struct · 70855603

Sathya Perla authored Aug 28, 2017

The routine bnxt_link_bp_to_dl() is used to set the devlink ptr
in bnxt struct (bp) and also to set the bnxt back ptr in
the devlink struct.  If devlink_register() fails, bp->dl must
be cleared which is not happening currently. This patch fixes
bnxt_link_bp_to_dl() to clear bp->dl by passing  a NULL dl ptr.

Fixes: 4ab0c6a8 ("bnxt_en: add support to enable VF-representors")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

70855603

bnxt_en: Reduce default rings on multi-port cards. · d5430d31

Michael Chan authored Aug 28, 2017

Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5430d31

bnxt_en: Improve -ENOMEM logic in NAPI poll loop. · 903649e7

Michael Chan authored Aug 28, 2017

If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget.  This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers.  Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

903649e7

bnxt: initialize board_info values with proper enums · 27573a7d

Scott Branden authored Aug 28, 2017

initialize board_info values with proper enums for defensive programming
purposes.  This will avoid any errors of the enums being declared not
lining up with the board_info array.
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27573a7d

bnxt: Add PCIe device IDs for bcm58802/bcm58808 · 4a58139b

Ray Jui authored Aug 28, 2017

Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a58139b

bnxt_en: assign CPU affinity hints to bnxt_en IRQs · 56f0fd80

Vasundhara Volam authored Aug 28, 2017

This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56f0fd80

bnxt_en: Improve tx ring reservation logic. · 98fdbe73

Michael Chan authored Aug 28, 2017

When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC. If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.

The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings. We fix it as follows:

1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available. There is a flag in
the firmware message to do that. If not available, abort and the
current TX rings will not be disabled.

2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved. If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98fdbe73

bnxt_en: Update firmware interface spec. to 1.8.1.4. · 6a17eb27

Michael Chan authored Aug 28, 2017

Flow APIs are added in this firmware interface.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a17eb27

Merge branch 'NCSI-vlan-filtering' · 33b86ba0

David S. Miller authored Aug 28, 2017

Samuel Mendoza-Jonas says:

====================
NCSI VLAN Filtering Support

This series (mainly patch 2) adds VLAN filtering to the NCSI implementation.
A fair amount of code already exists in the NCSI stack for VLAN filtering but
none of it is actually hooked up. This goes the final mile and fixes a few
bugs in the existing code found along the way (patch 1).

Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to
enable filtering as it's a large consumer of NCSI (and what I've been
testing on).

v3:	- Add comment describing change to ncsi_find_filter()
	- Catch NULL in clear_one_vid() from ncsi_get_filter()
	- Simplify state changes when kicking updated channel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

33b86ba0

ftgmac100: Support NCSI VLAN filtering when available · 51564585

Samuel Mendoza-Jonas authored Aug 28, 2017

Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the
NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available.
This allows the VLAN core to notify the NCSI driver when changes occur
so that the remote NCSI channel can be properly configured to filter on
the set VLAN tags.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51564585

net/ncsi: Configure VLAN tag filter · 21acf630

Samuel Mendoza-Jonas authored Aug 28, 2017

Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
stack process new VLAN tags and configure the channel VLAN filter
appropriately.
Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
for each one, meaning the ncsi_dev_state_config_svf state must be
repeated. An internal list of VLAN tags is maintained, and compared
against the current channel's ncsi_channel_filter in order to keep track
within the state. VLAN filters are removed in a similar manner, with the
introduction of the ncsi_dev_state_config_clear_vids state. The maximum
number of VLAN tag filters is determined by the "Get Capabilities"
response from the channel.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21acf630

net/ncsi: Fix several packet definitions · 8579a67e

Samuel Mendoza-Jonas authored Aug 28, 2017

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8579a67e

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · a74e344a

David S. Miller authored Aug 28, 2017

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-08-27

This series contains updates to i40e and i40evf only.

Sudheer updates code comments and state variable so that adminq_subtask
will have accutate information whenever it gets scheduled.

Mariusz stores information about FEC modes, to be used to printing link
states information, so that we do not need to call admin queue when
reporting link status.  Adds VF support for controlling VLAN tag
stripping via ethtool.

Jake provides the majority of changes in this series, starting with
increasing the size of the prefix buffer so that it can hold enough
characters for every possible input, which prevents snprintf truncation.
Fixed other string truncation errors/warnings produced by GCC 7.x.
Removed an unnecessary workaround for resetting XPS.  Fixed an issue
where there is a mismatched affinity mask value, so initialize the value
to cpu_possible_mask and invert the logic for checking incorrect CPU vs
IRQ affinity so that the exceptional case is handled at the check.
Removed ULTRA latency mode due to several issues found and will be
looking at better solution for small packet workloads.

Akeem fixes an issue where the incorrect flag was being used to set
promiscuous mode for unicast, which was enabling promiscuous mode only
for multicast instead of unicast.

Carolyn fixes an issue where an error return value is set, but this
value can be overwritten before we actually do exit the function.  So
remove the error code assignment and add code comments for better
understanding on why we do not need to set and return the error.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a74e344a

net-next/hinic: fix comparison of a uint16_t type with -1 · cde66f24

Aviad Krawczyk authored Aug 28, 2017

Remove the search for index of constant buffer size
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cde66f24

net-next/hinic: Fix MTU limitation · 52f31422

Aviad Krawczyk authored Aug 28, 2017

Fix the hw MTU limitation by setting max_mtu
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

52f31422

Merge branch 'irda-move-to-staging' · 70f5c982

David S. Miller authored Aug 28, 2017

Greg Kroah-Hartman says:

====================
irda: move it to drivers/staging so we can delete it

The IRDA code has long been obsolete and broken.  So, to keep people
from trying to use it, and to prevent people from having to maintain it,
let's move it to drivers/staging/ so that we can delete it entirely from
the kernel in a few releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

70f5c982

staging: irda: add a TODO file. · 333b3d07

Greg Kroah-Hartman authored Aug 27, 2017

The irda code will be deleted in a future kernel release, so no need to
have anyone do any new work on it.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

333b3d07

irda: move include/net/irda into staging subdirectory · 5bf916ee

Greg Kroah-Hartman authored Aug 27, 2017

And finally, move the irda include files into
drivers/staging/irda/include/net/irda.  Yes, it's a long path, but it
makes it easy for us to just add a Makefile directory path addition and
all of the net and drivers code "just works".
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5bf916ee