Commits · d0d443cddbefd5c9ab28b56c4a9463cac5f8ae17 · nexedi / linux

24 Mar, 2019 19 commits

net: aquantia: enable driver build for arm64 or compile_test · d0d443cd

Igor Russkikh authored Mar 23, 2019

The driver is now constantly tested in our lab on aarch64 hardware:
Jetson tx2, Pascal and Xavier tegra based hardware.
Many of tegra smmu related HW bugs were fixed or workarounded already.

Thus, add ARM64 into Kconfig.

Add also COMPILE_TEST dependency.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0d443cd

net: aquantia: improve LRO configuration · 1eef4757

Nikita Danilov authored Mar 23, 2019

Default LRO HW configuration was very conservative.

Low Number of Descriptors per LRO Sequence, small session
timeout, inefficient settings in interrupt generation logic.

Change max number of LRO descriptors from 2 to 16 to
increase performance. Increase maximum coalescing interval
in HW to 250uS. Tune up HW LRO interrupt generation setting
to prevent hw issues with long LRO sessions.
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1eef4757

net: aquantia: Increase rx ring default size from 1K to 2K · 1b09e72d

Igor Russkikh authored Mar 23, 2019

For multigig rates 1K ring size is often not enough and causes extra
packet drops in hardware.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b09e72d

net: aquantia: Make RX default frame size 2K · 8bd7e763

Igor Russkikh authored Mar 23, 2019

This correlates with default internet MTU. This also allows page
flip/reuse to be activated, since each allocated RX page now serves for
two frags/packets.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8bd7e763

net: aquantia: Introduce rx refill threshold value · 9773ef18

Igor Russkikh authored Mar 23, 2019

Before that, we've refilled ring even on single descriptor move.
Under high packet load that caused page allocation logic to be triggered
too often. That made overall ring processing slower.

Moreover, with page buffer reuse implemented, we should give a chance
higher networking levels to process received packets faster, release
the pages they consumed and therefore give a higher chance for these
pages to be reused.

RX ring is now refilled only when AQ_CFG_RX_REFILL_THRES or more
descriptors were processed (32 by default). Under regular traffic this
gives quite enough time for packet to be consumed and page to be reused.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9773ef18

net: aquantia: optimize rx performance by page reuse strategy · 46f4c29d

Igor Russkikh authored Mar 23, 2019

We introduce internal aq_rxpage wrapper over regular page
where extra field is tracked: rxpage offset inside of allocated page.

This offset allows to reuse one page for multiple packets.
When needed (for example with large frames processing), allocated
pageorder could be customized. This gives even larger page reuse
efficiency.

page_ref_count is used to track page users. If during rx refill
underlying page has users, we increase pg_off by rx frame size
thus the top half of the page is reused.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46f4c29d

net: aquantia: optimize rx path using larger preallocated skb len · 7e2698c4

Igor Russkikh authored Mar 23, 2019

Atlantic driver used 14 bytes preallocated skb size. That made L3 protocol
processing inefficient because pskb_pull had to fetch all the L3/L4 headers
from extra fragments.

Specially on UDP flows that caused extra packet drops because CPU was
overloaded with pskb_pull.

This patch uses eth_get_headlen for skb preallocation.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e2698c4

Merge tag 'mlx5-updates-2019-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d64fee0a

David S. Miller authored Mar 23, 2019

Saeed Mahameed says:

====================
mlx5-updates-2019-03-20

This series includes updates to mlx5 driver,

1) Compiler warnings cleanup from Saeed Mahameed
2) Parav Pandit simplifies sriov enable/disables
3) Gustavo A. R. Silva, Removes a redundant assignment
4) Moshe Shemesh, Adds Geneve tunnel stateless offload support
5) Eli Britstein, Adds the Support for VLAN modify action and
   Replaces TC VLAN pop and push actions with VLAN modify

Note: This series includes two simple non-mlx5 patches,

1) Declare IANA_VXLAN_UDP_PORT definition in include/net/vxlan.h,
and use it in some drivers.
2) Declare GENEVE_UDP_PORT definition in include/net/geneve.h,
and use it in mlx5 and nfp drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d64fee0a

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 071d08af

David S. Miller authored Mar 23, 2019

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-03-22

This series contains updates to ice driver only.

Akeem enables MAC anti-spoofing by default when a new VSI is being
created.  Fixes an issue when reclaiming VF resources back to the pool
after reset, by freeing VF resources separately using the first VF
vector index to traverse the list, instead of starting at the last
assigned vectors list.  Added support for VF & PF promiscuous mode in
the ice driver.  Fixed the PF driver from letting the VF know it is "not
trusted" when it attempts to add more than its permitted additional MAC
addresses.  Altered how the driver gets the VF VSIs instances, instead
of using the mailbox messages to retrieve VSIs, get it directly via the
VF object in the PF data structure.

Bruce fixes return values to resolve static analysis warnings.  Made
whitespace changes to increase readability and reduce code wrapping.

Anirudh cleans up code by removing a function prototype that was never
implemented and removed an unused field in the ice_sched_vsi_info
structure.

Kiran fixes a potential divide by zero issue by adding a check.

Victor cleans up the transmit scheduler by adjusting the stack variable
usage and added/modified debug prints to make them more useful.

Yashaswini updates the driver in VEB mode to ensure that the LAN_EN bit
is set if all the right conditions are met.

Christopher ensures the loopback enable bit is not set for prune switch
rules, since all transmit traffic would be looped back to the internal
switch and dropped.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

071d08af

Merge branch 'tcp-rx-tx-cache' · bdaba895

David S. Miller authored Mar 23, 2019

Eric Dumazet says:

====================
tcp: add rx/tx cache to reduce lock contention

On hosts with many cpus we can observe a very serious contention
on spinlocks used in mm slab layer.

The following can happen quite often :

1) TX path
  sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
  ACK is received on CPU B, and consumes the skb that was in the retransmit
  queue.

2) RX path
  network driver allocates skb on CPU C
  recvmsg() happens on CPU D, freeing the skb after it has been delivered
  to user space.

In both cases, we are hitting the asymetric alloc/free pattern
for which slab has to drain alien caches. At 8 Mpps per second,
this represents 16 Mpps alloc/free per second and has a huge penalty.

In an interesting experiment, I tried to use a single kmem_cache for all the skbs
(in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
                  kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
qnd most of the contention disappeared, since cpus could better use
their local slab per-cpu cache.

But we can do actually better, in the following patches.

TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
     so that next sendmsg() can reuse it immediately.

RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
   so that it can be freed by the cpu feeding the incoming packets in BH.

This increased the performance of small RPC benchmark by about 10 % on a host
with 112 hyperthreads.

v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
       clone has been freed.
     - Really test rps_needed in sk_eat_skb() as claimed.
     - Fixed rps_needed use in drivers/net/tun.c

v3: Added a #ifdef CONFIG_RPS, to avoid compile error (kbuild robot)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bdaba895

tcp: add one skb cache for rx · 8b27dae5

Eric Dumazet authored Mar 22, 2019

Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.

This means the incoming skb had to be allocated on a cpu,
but freed on another.

This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.

A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.

More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.

This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.

(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
 instead of 8 Mpps)

This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :

 - CPU handling the NIC rx interrupts, feeding the receive queue,
  and (after this patch) freeing the skbs that were consumed.

 - CPU in recvmsg() system call, essentially 100 % busy copying out
  data to user space.

Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.

Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.

To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b27dae5

tcp: add one skb cache for tx · 472c2e07

Eric Dumazet authored Mar 22, 2019

On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks.

    20.69%  [kernel]       [k] queued_spin_lock_slowpath
     5.64%  [kernel]       [k] _raw_spin_lock
     3.83%  [kernel]       [k] syscall_return_via_sysret
     3.48%  [kernel]       [k] __entry_text_start
     1.76%  [kernel]       [k] __netif_receive_skb_core
     1.64%  [kernel]       [k] __fget

For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes.

In many cases, ACK packets are handled by another cpus, and this unfortunately
incurs heavy costs for slab layer.

This patch uses an extra pointer in socket structure, so that we try to reuse
the same skb and avoid these expensive costs.

We cache at most one skb per socket so this should be safe as far as
memory pressure is concerned.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

472c2e07

net: convert rps_needed and rfs_needed to new static branch api · dc05360f

Eric Dumazet authored Mar 22, 2019

We prefer static_branch_unlikely() over static_key_false() these days.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc05360f

Merge branch 'net-dev-BYPASS-for-lockless-qdisc' · 7c1508e5

David S. Miller authored Mar 23, 2019

Paolo Abeni says:

====================
net: dev: BYPASS for lockless qdisc

This patch series is aimed at improving xmit performances of lockless qdisc
in the uncontended scenario.

After the lockless refactor pfifo_fast can't leverage the BYPASS optimization.
Due to retpolines the overhead for the avoidables enqueue and dequeue operations
has increased and we see measurable regressions.

The first patch introduces the BYPASS code path for lockless qdisc, and the
second one optimizes such path further. Overall this avoids up to 3 indirect
calls per xmit packet. Detailed performance figures are reported in the 2nd
patch.

 v2 -> v3:
  - qdisc_is_empty() has a const argument (Eric)

 v1 -> v2:
  - use really an 'empty' flag instead of 'not_empty', as
    suggested by Eric
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7c1508e5

net: dev: introduce support for sch BYPASS for lockless qdisc · ba27b4cd

Paolo Abeni authored Mar 22, 2019

With commit c5ad119f ("net: sched: pfifo_fast use skb_array")
pfifo_fast no longer benefit from the TCQ_F_CAN_BYPASS optimization.
Due to retpolines the cost of the enqueue()/dequeue() pair has become
relevant and we observe measurable regression for the uncontended
scenario when the packet-rate is below line rate.

After commit 46b1c18f ("net: sched: put back q.qlen into a
single location") we can check for empty qdisc with a reasonably
fast operation even for nolock qdiscs.

This change extends TCQ_F_CAN_BYPASS support to nolock qdisc.
The new chunk of code mirrors closely the existing one for traditional
qdisc, leveraging a newly introduced helper to read atomically the
qdisc length.

Tested with pktgen in queue xmit mode, with pfifo_fast, a MQ
device, and MQ root qdisc:

threads         vanilla         patched
                kpps            kpps
1               2465            2889
2               4304            5188
4               7898            9589

Same as above, but with a single queue device:

threads         vanilla         patched
                kpps            kpps
1               2556            2827
2               2900            2900
4               5000            5000
8               4700            4700

No mesaurable changes in the contended scenarios, and more 10%
improvement in the uncontended ones.

 v1 -> v2:
  - rebased after flag name change
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba27b4cd

net: sched: add empty status flag for NOLOCK qdisc · 28cff537

Paolo Abeni authored Mar 22, 2019

The queue is marked not empty after acquiring the seqlock,
and it's up to the NOLOCK qdisc clearing such flag on dequeue.
Since the empty status lays on the same cache-line of the
seqlock, it's always hot on cache during the updates.

This makes the empty flag update a little bit loosy. Given
the lack of synchronization between enqueue and dequeue, this
is unavoidable.

v2 -> v3:
 - qdisc_is_empty() has a const argument (Eric)

v1 -> v2:
 - use really an 'empty' flag instead of 'not_empty', as
   suggested by Eric
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28cff537

tcp: add documentation for tcp_ca_state · 576fd2f7

Soheil Hassas Yeganeh authored Mar 22, 2019

Add documentation to the tcp_ca_state enum, since this enum is
exposed in uapi.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Sowmini Varadhan <sowmini05@gmail.com>
Acked-by: Sowmini Varadhan <sowmini05@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

576fd2f7

tcp: remove conditional branches from tcp_mstamp_refresh() · e6d14070

Eric Dumazet authored Mar 22, 2019

tcp_clock_ns() (aka ktime_get_ns()) is using monotonic clock,
so the checks we had in tcp_mstamp_refresh() are no longer
relevant.

This patch removes cpu stall (when the cache line is not hot)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6d14070

net: phy: Correct Cygnus/Omega PHY driver prompt · a7a01ab3

Florian Fainelli authored Mar 21, 2019

The tristate prompt should have been replaced rather than defined a few
lines below, rebase mistake.

Fixes: 17cc9821 ("net: phy: Move Omega PHY entry to Cygnus PHY driver")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7a01ab3

22 Mar, 2019 21 commits

net/mlx5e: Replace TC VLAN pop and push actions with VLAN modify · 76b496b1

Eli Britstein authored Mar 21, 2019

Changing the VLAN header may be implemented by pop the existing header
and push a new one. Translate those operations as VLAN modify.
Applicable for use cases such as OVS where the controller translates a
vlan modify meta (OF) rule to DP pop+push actions rule.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

76b496b1

net/mlx5e: Support VLAN modify action · bdc837ee

Eli Britstein authored Mar 21, 2019

Support VLAN modify action by emulating a rewrite action for the VLAN
fields. Currently, the only supported field is the vid. The prio in the
action must be set to 0 to indicate no change.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

bdc837ee

net/mlx5e: Add VLAN ID rewrite fields · 0eb69bb9

Eli Britstein authored Mar 21, 2019

Add VLAN ID rewrite fields as a pre-step to support this rewrite.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

0eb69bb9

net: Add IANA_VXLAN_UDP_PORT definition to vxlan header file · bea96410

Moshe Shemesh authored Mar 21, 2019

Added IANA_VXLAN_UDP_PORT (4789) definition to vxlan header file so it
can be used by drivers instead of local definition.
Updated drivers which locally defined it as 4789 to use it.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Cc: Peng Li <lipeng321@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

bea96410

net/mlx5e: TX, Add geneve tunnel stateless offload support · e3cfc7e6

Moshe Shemesh authored Mar 21, 2019

Currently support only default geneve udp port (6081).
For the tx side, the HW is assisted by SW parsing, which sets the
headers offset to offload tunneled LSO and csum. Note that for udp
tunnels, we don't use special rx offloads, as rss on the outer headers
is enough, we support checksum complete and GRO takes care of
aggregation.

Geneve TSO BW and CPU load results (tested using iperf single tcp
stream).
In this patch we add TSO support over Geneve, so the "before" result
doesn't actually get to using the TSO HW offload even when turned on.
Tested on ConnectX-5, Intel(R) Xeon(R) CPU E5-2660 v2 @2.20GHz.

 __________________________________
| Before         | After           |
|________________|_________________|
| 12.6 Gbits/sec | 21.7 Gbits/sec  |
| 100% CPU load  | 61.5% CPU load  |
|________________|_________________|
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

e3cfc7e6

net/mlx5e: Take SW parser code to a separate function · cac018b8

Moshe Shemesh authored Mar 21, 2019

Refactor mlx5e_ipsec_set_swp() code, split the part which sets the eseg
software parser (SWP) offsets and flags, so it can be used in a
downstream patch by other mlx5e functionality which needs to set eseg
SWP.
The new function mlx5e_set_eseg_swp() is useful for setting swp for both
outer and inner headers. It also handles the special ipsec case of xfrm
mode transfer.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

cac018b8

net: Move the definition of the default Geneve udp port to public header file · 974eff2b

Moshe Shemesh authored Mar 21, 2019

Move the definition of the default Geneve udp port from the geneve
source to the header file, so we can re-use it from drivers.
Modify existing drivers to use it.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

974eff2b

net/mlx5e: Remove redundant assignment · bdde9311

Gustavo A. R. Silva authored Mar 21, 2019

Remove redundant assignment to tun_entropy->enabled.

Addesses-Coverity-ID: 1477328 ("Unused value")
Fixes: 97417f61 ("net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

bdde9311

net/mlx5e: Fix compilation warning in en_tc.c · ee576ec1

Saeed Mahameed authored Mar 21, 2019

Amazingly a mlx5e_tc function is being called from the eswitch layer,
which is by itself very terrible! The function was declared locally in
eswitch_offloads.c so it could be used there, which caused the following
compilation warning, fix that.

drivers/.../mlx5/core/en_tc.c:3242:6: [-Werror=missing-prototypes]
error: no previous prototype for ‘mlx5e_tc_clean_fdb_peer_flows’

Fixes: 04de7dda ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

ee576ec1

net/mlx5e: Fix port buffer function documentation format · d3669ca9

Saeed Mahameed authored Mar 21, 2019

This patch fixes compiler warnings:
In drivers/.../mlx5/core/en/port_buffer.c:190:
warning: Function parameter or member 'pfc_en' not described...
...
warning: Function parameter or member 'change' not described...

Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d3669ca9

net/mlx5: Fix compilation warning in eq.c · 092ead48

Saeed Mahameed authored Mar 21, 2019

mlx5_eq_table_get_rmap is being used only when CONFIG_RFS_ACCEL is
enabled, this patch fixes the below warning when CONFIG_RFS_ACCEL is
disabled.

drivers/.../mlx5/core/eq.c:903:18: [-Werror=missing-prototypes]
error: no previous prototype for ‘mlx5_eq_table_get_rmap’

Fixes: f2f3df55 ("net/mlx5: EQ, Privatize eq_table and friends")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

092ead48

net/mlx5: Simplify mlx5_sriov_is_enabled() by using pci core API · eb5cc431

Parav Pandit authored Mar 21, 2019

It is desired to get rid of num_vfs stored inside mlx5_core_sriov to
safely support vports more than vfs.
To reduce dependency on mlx5_core_sriov num_vfs, start using
pci_num_vf() from pci core.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

eb5cc431

net/mlx5: Rename total_vfs to total_vports · 2aca1787

Parav Pandit authored Mar 21, 2019

Macro MLX5_TOTAL_VPORTS() returns total number of vports. Therefore,
rename variable total_vfs to total_vports to improve code readability.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

2aca1787

net/mlx5: Simplify sriov enable/disable flow · 88d73849

Parav Pandit authored Mar 21, 2019

Simplify sriov enable/disable flow for below two checks.

1. PCI core driver allows sriov configuration only on a PF.
This is done in drivers/pci/pci-sysfs.c sriov_attrs_are_visible().

2. PCI core driver allow sriov enablement if the sriov is currently
disabled for for a PF. This is done in drivers/pci/pci-sysfs.c
sriov_numvfs_store().

Hence there is no need for mlx5 driver to duplicate such checks.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

88d73849

ice: Get VF VSI instances directly via PF · f1ef73f5

Akeem G Abodunrin authored Feb 26, 2019

This patch changes how we get VF VSIs instances. Instead of relying on
mailbox virtual channel message to retrieve VSI, it is more reliable
getting it directly via VF object in PF data structure.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f1ef73f5

ice: Don't let VF know that it is untrusted · d84b899a

Akeem G Abodunrin authored Feb 26, 2019

Don't let the VF know it's not trusted when it tries to add more than
permitted additional MAC addresses.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

d84b899a

ice: Set LAN_EN for all directional rules · 26069b44

Yashaswini Raghuram Prathivadi Bhayankaram authored Feb 26, 2019

The LAN_EN bit for a switch rule determines if the packet can go out
on the wire or not. Set the LAN_EN flag in the switch action for all
directional rules.
Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

26069b44

ice: Do not set LB_EN for prune switch rules · b58dafbc

Christopher N Bednarz authored Feb 26, 2019

LB_EN for prune switch rules was causing all TX traffic
to loopback to the internal switch and dropped.  When
running bi-directional stress workloads with RDMA
the RDPU would hang blocking tx and rx traffic.
Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

b58dafbc

ice: Enable LAN_EN for the right recipes · 277b3a45

Yashaswini Raghuram Prathivadi Bhayankaram authored Feb 26, 2019

In VEB mode, enable LAN_EN bit in the action fields for filter rules
corresponding to the right recipes.
Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

277b3a45

ice: Add support for PF/VF promiscuous mode · 5eda8afd

Akeem G Abodunrin authored Feb 26, 2019

Implement support for VF promiscuous mode, MAC/VLAN/MAC_VLAN and PF
multicast MAC/VLAN/MAC_VLAN promiscuous mode.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

5eda8afd

ice: code cleanup in ice_sched.c · e1ca65a3

Victor Raj authored Feb 26, 2019

This patch does some clean up in the Tx scheduler code:

1. Adjust the stack variable usage
2. Modify the debug prints to display the FW error
3. Add additional debug prints while adding/removing VSIs
Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1ca65a3