Commits · d8c09f19accb89fc08b246339abb005455e4c846 · nexedi / linux

27 Apr, 2018 40 commits

bnxt_en: Reserve rings in bnxt_set_channels() if device is down. · d8c09f19

Michael Chan authored Apr 26, 2018

The current code does not reserve rings during ethtool -L when the device
is down. The rings will be reserved when the device is later opened.

Change it to reserve rings during ethtool -L when the device is down.
This provides a better guarantee that the device open will be successful
when the rings are reserved ahead of time.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d8c09f19

bnxt_en: add debugfs support for DIM · cabfb09d

Andy Gospodarek authored Apr 26, 2018

This adds debugfs support for bnxt_en with the purpose of allowing users
to examine the current DIM profile in use for each receive queue. This
was instrumental in debugging issues found with DIM and ensuring that
the profiles we expect to use are the profiles being used.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cabfb09d

bnxt_en: reduce timeout on initial HWRM calls · 9751e8e7

Andy Gospodarek authored Apr 26, 2018

Testing with DIM enabled on older kernels indicated that firmware calls
were slower than expected. More detailed analysis indicated that the
default 25us delay was higher than necessary. Reducing the time spend in
usleep_range() for the first several calls would reduce the overall
latency of firmware calls on newer Intel processors.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9751e8e7

bnxt_en: Increase RING_IDLE minimum threshold to 50 · 05abe4dd

Andy Gospodarek authored Apr 26, 2018

This keeps the RING_IDLE flag set in hardware for higher coalesce
settings by default and improved latency.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05abe4dd

bnxt_en: Do not allow VF to read EEPROM. · 4cebbaca

Michael Chan authored Apr 26, 2018

Firmware does not allow the operation and would return failure, causing
a warning in dmesg. So check for VF and disallow it in the driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4cebbaca

bnxt_en: Display function level rx/tx_discard_pkts via ethtool · 20c1d28e

Vasundhara Volam authored Apr 26, 2018

Add counters to display sum of rx/tx_discard_pkts of all rings as
function level statistics via ethtool.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20c1d28e

bnxt_en: Simplify ring alloc/free error messages. · 2727c888

Michael Chan authored Apr 26, 2018

Replace switch statements printing different messages for every ring type
with a common message.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2727c888

bnxt_en: Do not set firmware time from VF driver on older firmware. · ca2c39e2

Michael Chan authored Apr 26, 2018

Older firmware will reject this call and cause an error message to
be printed by the VF driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca2c39e2

bnxt_en: Check the lengths of encapsulated firmware responses. · 59895f59

Michael Chan authored Apr 26, 2018

Firmware messages that are forwarded from PF to VFs are encapsulated.
The size of these encapsulated messages must not exceed the maximum
defined message size. Add appropriate checks to avoid oversize
messages. Firmware messages may be expanded in future specs and
this will provide some guardrails to avoid data corruption.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59895f59

bnxt_en: Remap TC to hardware queues when configuring PFC. · d31cd579

Michael Chan authored Apr 26, 2018

Initially, the MQPRIO TCs are mapped 1:1 directly to the hardware
queues. Some of these hardware queues are configured to be lossless.
When PFC is enabled on one of more TCs, we now need to remap the
TCs that have PFC enabled to the lossless hardware queues.

After remapping, we need to close and open the NIC for the new
mapping to take effect. We also need to reprogram all ETS parameters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d31cd579

bnxt_en: Add TC to hardware QoS queue mapping logic. · 2e8ef77e

Michael Chan authored Apr 26, 2018

The current driver maps MQPRIO traffic classes directly 1:1 to the
internal hardware queues (TC0 maps to hardware queue 0, etc).  This
direct mapping requires the internal hardware queues to be reconfigured
from lossless to lossy and vice versa when necessary.  This
involves reconfiguring internal buffer thresholds which is
disruptive and not always reliable.

Implement a new scheme to map TCs to internal hardware queues by
matching up their PFC requirements.  This will eliminate the need
to reconfigure a hardware queue internal buffers at run time.  After
remapping, the NIC is closed and opened for the new TC to hardware
queues to take effect.

This patch only adds the basic mapping logic.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e8ef77e

hv_netvsc: simplify receive side calling arguments · c347b927

Stephen Hemminger authored Apr 26, 2018

The calls up from the napi poll reading the receive ring had many
places where an argument was being recreated. I.e the caller already
had the value and wasn't passing it, then the callee would use
known relationship to determine the same value. Simpler and faster
to just pass arguments needed.

Also, add const in a couple places where message is being only read.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c347b927

Merge branch 'sctp-refactor-MTU-handling' · 4422cc0d

David S. Miller authored Apr 27, 2018

Marcelo Ricardo Leitner says:

====================
sctp: refactor MTU handling

Currently MTU handling is spread over SCTP stack. There are multiple
places doing same/similar calculations and updating them is error prone
as one spot can easily be left out.

This patchset converges it into a more concise and consistent code. In
general, it moves MTU handling from functions with bigger objectives,
such as sctp_assoc_add_peer(), to specific functions.

It's also a preparation for the next patchset, which removes the
duplication between sctp_make_op_error_space and
sctp_make_op_error_fixed and relies on sctp_mtu_payload introduced here.

More details on each patch.
====================
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4422cc0d

sctp: allow unsetting sockopt MAXSEG · 38687b56

Marcelo Ricardo Leitner authored Apr 26, 2018

RFC 6458 Section 8.1.16 says that setting MAXSEG as 0 means that the user
is not limiting it, and not that it should set to the *current* maximum,
as we are doing.

This patch thus allow setting it as 0, effectively removing the user
limit.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38687b56

sctp: consider idata chunks when setting SCTP_MAXSEG · 439ef030

Marcelo Ricardo Leitner authored Apr 26, 2018

When setting SCTP_MAXSEG sock option, it should consider which kind of
data chunk is being used if the asoc is already available, so that the
limit better reflect reality.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

439ef030

sctp: honor PMTU_DISABLED when handling icmp · 63d01330

Marcelo Ricardo Leitner authored Apr 26, 2018

sctp_sendmsg() could trigger PMTU updates even when PMTU_DISABLED was
set, as pmtu_pending could be set unconditionally during icmp handling
if the socket was in use by the application.

This patch fixes it by checking for PMTU_DISABLED when handling such
deferred updates.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63d01330

sctp: re-use sctp_transport_pmtu in sctp_transport_route · 6e91b578

Marcelo Ricardo Leitner authored Apr 26, 2018

sctp_transport_route currently is very similar to sctp_transport_pmtu plus
a few other bits.

This patch reuses sctp_transport_pmtu in sctp_transport_route and removes
the duplicated code.

Also, as all calls to sctp_transport_route were forcing the dst release
before calling it, let's just include such release too.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e91b578

sctp: remove sctp_transport_pmtu_check · 22d7be26

Marcelo Ricardo Leitner authored Apr 26, 2018

We are now keeping the MTU information synced between asoc, transport
and dst, which makes the check at sctp_packet_config() not needed
anymore. As it was the sole caller to this function, lets remove it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22d7be26

sctp: introduce sctp_dst_mtu · 6ff0f871

Marcelo Ricardo Leitner authored Apr 26, 2018

Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ff0f871

sctp: remove sctp_assoc_pending_pmtu · 2521680e

Marcelo Ricardo Leitner authored Apr 26, 2018

No need for this helper.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2521680e

sctp: introduce sctp_assoc_update_frag_point · 2f5e3c9d

Marcelo Ricardo Leitner authored Apr 26, 2018

and avoid the open-coded versions of it.

Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will
always be updated.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f5e3c9d

sctp: introduce sctp_mtu_payload · feddd6c1

Marcelo Ricardo Leitner authored Apr 26, 2018

When given a MTU, this function calculates how much payload we can carry
on it. Without a MTU, it calculates the amount of header overhead we
have.

So that when we have extra overhead, like the one added for IP options
on SELinux patches, it is easier to handle it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

feddd6c1

sctp: introduce sctp_assoc_set_pmtu · c4b2893d

Marcelo Ricardo Leitner authored Apr 26, 2018

All changes to asoc PMTU should now go through this wrapper, making it
easier to track them and to do other actions upon it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4b2893d

sctp: remove an if() that is always true · c88da20f

Marcelo Ricardo Leitner authored Apr 26, 2018

As noticed by Xin Long, the if() here is always true as PMTU can never
be 0.
Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c88da20f

sctp: move transport pathmtu calc away of sctp_assoc_add_peer · 800e00c1

Marcelo Ricardo Leitner authored Apr 26, 2018

There was only one case that sctp_assoc_add_peer couldn't handle, which
is when SPP_PMTUD_DISABLE is set and pathmtu not initialized.
So add this situation to sctp_transport_route and reuse what was
already in there.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

800e00c1

sctp: remove old and unused SCTP_MIN_PMTU · c45698f8

Marcelo Ricardo Leitner authored Apr 26, 2018

This value is not used anywhere in the code. In essence it is a
duplicate of SCTP_DEFAULT_MINSEGMENT, which is used by the stack.

SCTP_MIN_PMTU value makes more sense, but we should not change to it now
as it would risk breaking applications.

So this patch removes SCTP_MIN_PMTU and adjust the comment above it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c45698f8

selftests: pmtu: Minimum MTU for vti6 is 68 · 5a643c86

Stefano Brivio authored Apr 26, 2018

A vti6 interface can carry IPv4 packets too.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a643c86

tcp: remove mss check in tcp_select_initial_window() · c36207bd

Wei Wang authored Apr 26, 2018

In tcp_select_initial_window(), we only set rcv_wnd to
tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
rcv_wnd is kept at the full receive space of the socket which is a
value way larger than tcp_default_init_rwnd().
With larger initial rcv_wnd value, receive buffer autotuning logic
takes longer to kick in and increase the receive buffer.

In a TCP throughput test where receiver has rmem[2] set to 125MB
(wscale is 11), we see the connection gets recvbuf limited at the
beginning of the connection and gets less throughput overall.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c36207bd

Merge branch 'smc-next' · 448c907c

David S. Miller authored Apr 27, 2018

Ursula Braun says:

====================
smc fixes from 2018-04-17 - v3

in the mean time we challenged the benefit of these CLC handshake
optimizations for the sockopts TCP_NODELAY and TCP_CORK.
We decided to give up on them for now, since SMC still works
properly without.
There is now version 3 of the patch series with patches 2-4 implementing
sockopts that require special handling in SMC.

Version 3 changes
   * no deferring of setsockopts TCP_NODELAY and TCP_CORK anymore
   * allow fallback for some sockopts eliminating SMC usage
   * when setting TCP_NODELAY always enforce data transmission
     (not only together with corked data)

Version 2 changes of Patch 2/4 (and 3/4):
   * return error -EOPNOTSUPP for TCP_FASTOPEN sockopts
   * fix a kernel_setsockopt() usage bug by switching parameter
     variable from type "u8" to "int"
   * add return code validation when calling kernel_setsockopt()
   * propagate a setsockopt error on the internal CLC socket
     to the SMC socket.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

448c907c

net/smc: handle sockopt TCP_DEFER_ACCEPT · abb190f1

Ursula Braun authored Apr 26, 2018

If sockopt TCP_DEFER_ACCEPT is set, the accept is delayed till
data is available.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abb190f1

net/smc: sockopts TCP_NODELAY and TCP_CORK · 01d2f7e2

Ursula Braun authored Apr 26, 2018

Setting sockopt TCP_NODELAY or resetting sockopt TCP_CORK
triggers data transfer.

For a corked SMC socket RDMA writes are deferred, if there is
still sufficient send buffer space available.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01d2f7e2

net/smc: handle sockopts forcing fallback · ee9dfbef

Ursula Braun authored Apr 26, 2018

Several TCP sockopts do not work for SMC. One example are the
TCP_FASTOPEN sockopts, since SMC-connection setup is based on the TCP
three-way-handshake.
If the SMC socket is still in state SMC_INIT, such sockopts trigger
fallback to TCP. Otherwise an error is returned.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee9dfbef

net/smc: fix structure size · 33825761

Karsten Graul authored Apr 26, 2018

The struct smc_cdc_msg must be defined as packed so the
size is 44 bytes.
And change the structure size check so sizeof is checked.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

33825761

net: intel: Cleanup the copyright/license headers · 51dce24b

Jeff Kirsher authored Apr 26, 2018

After many years of having a ~30 line copyright and license header to our
source files, we are finally able to reduce that to one line with the
advent of the SPDX identifier.

Also caught a few files missing the SPDX license identifier, so fixed
them up.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51dce24b

net: Fix coccinelle warning · 3f5ecd8a

Kirill Tkhai authored Apr 26, 2018

kbuild test robot says:

  >coccinelle warnings: (new ones prefixed by >>)
  >>> net/core/dev.c:1588:2-3: Unneeded semicolon

So, let's remove it.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f5ecd8a

geneve: fix build with modular IPV6 · 094be092

Tobias Regnery authored Apr 26, 2018

Commit c40e89fd ("geneve: configure MTU based on a lower device") added
an IS_ENABLED(CONFIG_IPV6) to geneve, leading to the following link error
with CONFIG_GENEVE=y and CONFIG_IPV6=m:

drivers/net/geneve.o: In function `geneve_link_config':
geneve.c:(.text+0x14c): undefined reference to `rt6_lookup'

Fix this by adding a Kconfig dependency and forcing GENEVE to be a module
when IPV6 is a module.

Fixes: c40e89fd ("geneve: configure MTU based on a lower device")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

094be092

Merge branch 's390-next' · c2335d67

David S. Miller authored Apr 27, 2018

Julian Wiedmann says:

====================
s390/net: updates 2018-04-26

please apply the following patches to net-next. There's the usual
cleanups & small improvements, and Kittipon adds HW offload support
for IPv6 checksumming.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c2335d67

s390/qeth: improve fallback to random MAC address · 21b1702a

Julian Wiedmann authored Apr 26, 2018

If READ MAC fails to fetch a valid MAC address, allow some more device
types (IQD and z/VM OSD) to fall back to a random address.
Also use eth_hw_addr_random(), for indicating to userspace that the
address type is NET_ADDR_RANDOM.

Note that while z/VM has various protection schemes to prohibit
custom addresses on its NICs, they are all optional. So we should at
least give it a try.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21b1702a

s390/qeth: add IPv6 RX checksum offload support · d7e6ed97

Kittipon Meesompop authored Apr 26, 2018

Check if a qeth device supports IPv6 RX checksum offload, and hook it up
into the existing NETIF_F_RXCSUM support.
As NETIF_F_RXCSUM is now backed by a combination of HW Assists, we need
to be a little smarter when dealing with errors during a configuration
change:
- switching on NETIF_F_RXCSUM only makes sense if at least one HW Assist
  was enabled successfully.
- for switching off NETIF_F_RXCSUM, all available HW Assists need to be
  deactivated.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7e6ed97

s390/qeth: add IPv6 TX checksum offload support · 571f9dd8

Kittipon Meesompop authored Apr 26, 2018

Check if a qeth device supports IPv6 TX checksum offload, and advertise
NETIF_F_IPV6_CSUM accordingly. Add support for setting the relevant bits
in IPv6 packet descriptors.

Currently this has only limited use (ie. UDP, or Jumbo Frames). For any
TCP traffic with a standard MSS, the TCP checksum gets calculated
as part of the linear GSO segmentation.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

571f9dd8