Commits · 59c84b9fcf42c99a945d5fdc49220d854e539690 · Kirill Smelkov / linux

12 Aug, 2019 1 commit

netdevsim: Restore per-network namespace accounting for fib entries · 59c84b9f

David Ahern authored Aug 06, 2019

Prior to the commit in the fixes tag, the resource controller in netdevsim
tracked fib entries and rules per network namespace. Restore that behavior.

Fixes: 5fc49422 ("netdevsim: create devlink instance per netdevsim instance")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59c84b9f

11 Aug, 2019 1 commit

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 9481382b

David S. Miller authored Aug 11, 2019

Daniel Borkmann says:

====================
pull-request: bpf 2019-08-11

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei.

2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii.

3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak.

4) Fix bpftool to dump an error if pinning fails, from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9481382b

10 Aug, 2019 1 commit

net/tls: swap sk_write_space on close · 57c722e9

Jakub Kicinski authored Aug 09, 2019

Now that we swap the original proto and clear the ULP pointer
on close we have to make sure no callback will try to access
the freed state. sk_write_space is not part of sk_prot, remember
to swap it.

Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57c722e9

09 Aug, 2019 24 commits

hv_netvsc: Fix a warning of suspicious RCU usage · 6d0d779d

Dexuan Cui authored Aug 09, 2019

This fixes a warning of "suspicious rcu_dereference_check() usage"
when nload runs.

Fixes: 776e726b ("netvsc: fix RCU warning in get_stats")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d0d779d

Merge tag 'mlx5-fixes-2019-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 9566e650

David S. Miller authored Aug 09, 2019

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-08-08

This series introduces some fixes to mlx5 driver.

Highlights:
1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs.
2) From Aya, Fixes to mlx5 tx devlink health reporter.
3) From Maxim, aRFs parsing to use flow dissector to avoid relying on
invalid skb fields.

Please pull and let me know if there is any problem.

For -stable v4.3
 ('net/mlx5e: Only support tx/rx pause setting for port owner')
For -stable v4.9
 ('net/mlx5e: Use flow keys dissector to parse packets for ARFS')
For -stable v5.1
 ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery')
 ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter')
 ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off')

Note: when merged with net-next this minor conflict will pop up:
++<<<<<<< (net-next)
 +      if (is_eswitch_flow) {
 +              flow->esw_attr->match_level = match_level;
 +              flow->esw_attr->tunnel_match_level = tunnel_match_level;
++=======
+       if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+               flow->esw_attr->inner_match_level = inner_match_level;
+               flow->esw_attr->outer_match_level = outer_match_level;
++>>>>>>> (net)

To resolve, use hunks from net (2nd) and replace:
if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
with
if (is_eswitch_flow)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9566e650

ixgbe: fix possible deadlock in ixgbe_service_task() · 8b638160

Taehee Yoo authored Aug 08, 2019

ixgbe_service_task() calls unregister_netdev() under rtnl_lock().
But unregister_netdev() internally calls rtnl_lock().
So deadlock would occur.

Fixes: 59dd45d5 ("ixgbe: firmware recovery mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b638160

Merge branch 'Fix-collisions-in-socket-cookie-generation' · 703acf62

David S. Miller authored Aug 09, 2019

Daniel Borkmann says:

====================
Fix collisions in socket cookie generation

This change makes the socket cookie generator as a global counter
instead of per netns in order to fix cookie collisions for BPF use
cases we ran into. See main patch #1 for more details.

Given the change is small/trivial and fixes an issue we're seeing
my preference would be net tree (though it cleanly applies to
net-next as well). Went for net tree instead of bpf tree here given
the main change is in net/core/sock_diag.c, but either way would be
fine with me.

v1 -> v2:
  - Fix up commit description in patch #1, thanks Eric!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

703acf62

bpf: sync bpf.h to tools infrastructure · 609a2ca5

Daniel Borkmann authored Aug 08, 2019

Pull in updates in BPF helper function description.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

609a2ca5

sock: make cookie generation global instead of per netns · cd48bdda

Daniel Borkmann authored Aug 08, 2019

Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.

The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.

Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd48bdda

Merge tag 'rxrpc-fixes-20190809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 7bac762d

David S. Miller authored Aug 09, 2019

David Howells says:

====================
Here's a couple of fixes for rxrpc:

 (1) Fix refcounting of the local endpoint.

 (2) Don't calculate or report packet skew information.  This has been
     obsolete since AFS 3.1 and so is a waste of resources.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7bac762d

Merge branch 'bpf-bpftool-pinning-error-msg' · 4f7aafd7

Daniel Borkmann authored Aug 09, 2019

Jakub Kicinski says:

====================
First make sure we don't use "prog" in error messages because
the pinning operation could be performed on a map. Second add
back missing error message if pin syscall failed.
====================
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

4f7aafd7

tools: bpftool: add error message on pin failure · 3c7be384

Jakub Kicinski authored Aug 06, 2019

No error message is currently printed if the pin syscall
itself fails. It got lost in the loadall refactoring.

Fixes: 77380998 ("bpftool: add loadall command")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3c7be384

tools: bpftool: fix error message (prog -> object) · b3e78adc

Jakub Kicinski authored Aug 06, 2019

Change an error message to work for any object being
pinned not just programs.

Fixes: 71bb428f ("tools: bpf: add bpftool")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b3e78adc

rxrpc: Don't bother generating maxSkew in the ACK packet · e8c3af6b

David Howells authored Aug 09, 2019

Don't bother generating maxSkew in the ACK packet as it has been obsolete
since AFS 3.1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>

e8c3af6b

rxrpc: Fix local endpoint refcounting · 730c5fd4

David Howells authored Aug 09, 2019

The object lifetime management on the rxrpc_local struct is broken in that
the rxrpc_local_processor() function is expected to clean up and remove an
object - but it may get requeued by packets coming in on the backing UDP
socket once it starts running.

This may result in the assertion in rxrpc_local_rcu() firing because the
memory has been scheduled for RCU destruction whilst still queued:

	rxrpc: Assertion failed
	------------[ cut here ]------------
	kernel BUG at net/rxrpc/local_object.c:468!

Note that if the processor comes around before the RCU free function, it
will just do nothing because ->dead is true.

Fix this by adding a separate refcount to count active users of the
endpoint that causes the endpoint to be destroyed when it reaches 0.

The original refcount can then be used to refcount objects through the work
processor and cause the memory to be rcu freed when that reaches 0.

Fixes: 4f95dd78 ("rxrpc: Rework local endpoint management")
Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>

730c5fd4

net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context · 8c25d088

Fuqian Huang authored Aug 09, 2019

As spin_unlock_irq will enable interrupts.
Function tsi108_stat_carry is called from interrupt handler tsi108_irq.
Interrupts are enabled in interrupt handler.
Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq
in IRQ context to avoid this.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c25d088

team: Add vlan tx offload to hw_enc_features · 227f2f03

YueHaibing authored Aug 08, 2019

We should also enable team's vlan tx offload in hw_enc_features,
pass the vlan packets to the slave devices with vlan tci, let the
slave handle vlan tunneling offload implementation.

Fixes: 3268e5cb ("team: Advertise tunneling offload features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

227f2f03

net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662

Jakub Kicinski authored Aug 07, 2019

sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.

Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.

Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.

Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.

Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).

See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.

v2:
 - remove superfluous decrypted mark copy (Willem);
 - remove the stale doc entry (Boris);
 - rely entirely on EOR marking to prevent coalescing (Boris);
 - use an internal sendpages flag instead of marking the socket
   (Boris).
v3 (Willem):
 - reorganize the can_skb_orphan_partial() condition;
 - fix the flag leak-in through tcp_bpf_sendmsg.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41477662

Merge branch 'skbedit-batch-fixes' · 0de94de1

David S. Miller authored Aug 08, 2019

Roman Mashak says:

====================
Fix batched event generation for skbedit action

When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.

For example, consider the following script adding 32 entries with all
supported skbedit parameters and cookie (in order to maximize netlink
messages size):

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action skbedit
for i in `seq 1 $1`;
do
   cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \
               ptype host inheritdsfield \
               index $i cookie aabbccddeeff112233445566778800a1 "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of skbedit action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant skbedit test cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0de94de1

tc-testing: updated skbedit action tests with batch create/delete · 7bc16184

Roman Mashak authored Aug 07, 2019

Update TDC tests with cases varifying ability of TC to install or delete
batches of skbedit actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7bc16184

net sched: update skbedit action for batched events operations · e1fea322

Roman Mashak authored Aug 07, 2019

Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Fixes: ca9b0e27 ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1fea322

net: dsa: sja1105: remove set but not used variables 'tx_vid' and 'rx_vid' · e3e3af9a

YueHaibing authored Aug 07, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump:
drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning:
 variable tx_vid set but not used [-Wunused-but-set-variable]
drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning:
 variable rx_vid set but not used [-Wunused-but-set-variable]

They are not used since commit 6d7c7d94 ("net: dsa:
sja1105: Fix broken learning with vlan_filtering disabled")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3e3af9a

bonding: Add vlan tx offload to hw_enc_features · d595b03d

YueHaibing authored Aug 07, 2019

As commit 30d8177e ("bonding: Always enable vlan tx offload")
said, we should always enable bonding's vlan tx offload, pass the
vlan packets to the slave devices with vlan tci, let them to handle
vlan implementation.

Now if encapsulation protocols like VXLAN is used, skb->encapsulation
may be set, then the packet is passed to vlan device which based on
bonding device. However in netif_skb_features(), the check of
hw_enc_features:

	 if (skb->encapsulation)
                 features &= dev->hw_enc_features;

clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
in same issue in commit 30d8177e like this:

vlan_dev_hard_start_xmit
  -->dev_queue_xmit
    -->validate_xmit_skb
      -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
      -->validate_xmit_vlan
        -->__vlan_hwaccel_push_inside //skb->tci is cleared
...
 --> bond_start_xmit
   --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
     --> __skb_flow_dissect // nhoff point to IP header
        -->  case htons(ETH_P_8021Q)
             // skb_vlan_tag_present is false, so
             vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
             //vlan point to ip header wrongly

Fixes: b2a103e6 ("bonding: convert to ndo_fix_features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d595b03d

net: sched: sch_taprio: fix memleak in error path for sched list parse · 51650d33

Ivan Khoronzhuk authored Aug 07, 2019

In error case, all entries should be freed from the sched list
before deleting it. For simplicity use rcu way.

Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

51650d33

net: docs: replace IPX in tuntap documentation · fe90689f

Stephen Hemminger authored Aug 05, 2019

IPX is no longer supported, but the example in the documentation
might useful. Replace it with IPv6.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe90689f

docs: admin-guide: remove references to IPX and token-ring · 7e7c076e

Stephen Hemminger authored Aug 05, 2019

Both IPX and TR have not been supported for a while now.
Remove them from the /proc/sys/net documentation.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e7c076e

xen/netback: Reset nr_frags before freeing skb · 3a0233dd

Ross Lagerwall authored Aug 05, 2019

At this point nr_frags has been incremented but the frag does not yet
have a page assigned so freeing the skb results in a crash. Reset
nr_frags before freeing the skb to prevent this.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a0233dd

08 Aug, 2019 13 commits

inet: frags: re-introduce skb coalescing for local delivery · 891584f4

Guillaume Nault authored Aug 02, 2019

Before commit d4289fcc ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).

By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e3 ("Revert
"ipv4: use skb coalescing in defragmentation"")).

Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.

When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.

Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.

This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.

[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1

Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

891584f4

net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter · a4e508ca

Aya Levin authored Aug 08, 2019

Remove check of recovery bit, in the beginning of the CQE recovery
function. This test is already performed right before the reporter
is invoked, when CQE error is detected.

Fixes: de8650a8 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

a4e508ca

net/mlx5e: Fix error flow of CQE recovery on tx reporter · 276d197e

Aya Levin authored Aug 06, 2019

CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving the recovery
function, to allow further recoveries to take place. This allows removal
of clearing recovery bit on sq activate.

Fixes: de8650a8 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

276d197e

net/mlx5e: Fix false negative indication on tx reporter CQE recovery · d9a2fcf5

Aya Levin authored Aug 07, 2019

Remove wrong error return value when SQ is not in error state.
CQE recovery on TX reporter queries the sq state. If the sq is not in
error state, the sq is either in ready or reset state. Ready state is
good state which doesn't require recovery and reset state is a temporal
state which ends in ready state. With this patch, CQE recovery in this
scenario is successful.

Fixes: de8650a8 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d9a2fcf5

net/mlx5e: kTLS, Fix tisn field placement · b86f1abe

Tariq Toukan authored Jul 30, 2019

Shift the tisn field in the WQE control segment, per the
HW specification.

Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

b86f1abe

net/mlx5e: kTLS, Fix tisn field name · f1897b3c

Tariq Toukan authored Aug 08, 2019

Use the proper tisn field name from the union in struct mlx5_wqe_ctrl_seg.

Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

f1897b3c

net/mlx5e: kTLS, Fix progress params context WQE layout · a9bc3390

Tariq Toukan authored Jul 30, 2019

The TLS progress params context WQE should not include an
Eth segment, drop it.
In addition, align the tls_progress_params layout with the
HW specification document:
- fix the tisn field name.
- remove the valid bit.

Fixes: a12ff35e ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

a9bc3390

net/mlx5: kTLS, Fix wrong TIS opmod constants · 26149e3e

Tariq Toukan authored Jul 21, 2019

Fix the used constants for TLS TIS opmods, per the HW specification.

Fixes: a12ff35e ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

26149e3e

net/mlx5: crypto, Fix wrong offset in encryption key command · 55c9bd37

Tariq Toukan authored Jul 21, 2019

Fix the 128b key offset in key encryption key creation command,
per the HW specification.

Fixes: 45d3b55d ("net/mlx5: Add crypto library to support create/destroy encryption key")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

55c9bd37

net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off · 5faf5b70

Mohamad Heib authored Apr 23, 2019

Setting speed to 56GBASE is allowed only with auto-negotiation enabled.

This patch prevent setting speed to 56GBASE when auto-negotiation disabled.

Fixes: f62b8bb8 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

5faf5b70

net/mlx5e: Only support tx/rx pause setting for port owner · 466df6eb

Huy Nguyen authored Aug 01, 2019

Only support changing tx/rx pause frame setting if the net device
is the vport group manager.

Fixes: 3c2d18ef ("net/mlx5e: Support ethtool get/set_pauseparam")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

466df6eb

net/mlx5: Support inner header match criteria for non decap flow action · 93b3586e

Huy Nguyen authored Jul 17, 2019

We have an issue that OVS application creates an offloaded drop rule
that drops VXLAN traffic with both inner and outer header match
criteria. mlx5_core driver detects correctly the inner and outer
header match criteria but does not enable the inner header match criteria
due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that
only decap rule needs inner header criteria.

Solution:
Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add
two new members: inner_match_level and outer_match_level.
inner/outer_match_level is set to NONE if the inner/outer match criteria
is not specified in the tc rule creation request. The decap assumption is
removed and the code just needs to check for inner/outer_match_level to
enable the corresponding bit in firmware's match_criteria_enable value.

Fixes: 6363651d ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

93b3586e

net/mlx5e: Use flow keys dissector to parse packets for ARFS · 405b93eb

Maxim Mikityanskiy authored Jul 05, 2019

The current ARFS code relies on certain fields to be set in the SKB
(e.g. transport_header) and extracts IP addresses and ports by custom
code that parses the packet. The necessary SKB fields, however, are not
always set at that point, which leads to an out-of-bounds access. Use
skb_flow_dissect_flow_keys() to get the necessary information reliably,
fix the out-of-bounds access and reuse the code.

Fixes: 18c908e4 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

405b93eb