Commits · 1c03e03f9b5278701d4a0e3444b2de3b9ddc244b · Kirill Smelkov / linux

09 Oct, 2017 36 commits

nfp: bpf: pad code with valid nops · 1c03e03f

Jakub Kicinski authored Oct 08, 2017

We need to append up to 8 nops after last instruction to make
sure the CPU will not fetch garbage instructions with invalid
ECC if the code store was not initialized.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c03e03f

nfp: bpf: calculate code store ECC · fd068ddc

Jakub Kicinski authored Oct 08, 2017

In the initial PoC firmware I simply disabled ECC on the instruction
store. Do the ECC calculation for generated instructions in the driver.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd068ddc

nfp: bpf: move to datapath ABI version 2 · 18e53b6c

Jakub Kicinski authored Oct 08, 2017

Datapath ABI version 2 stores the packet information in LMEM
instead of NNRs.  We also have strict restrictions on which
GPRs we can use.  Only GPRs 0-23 are reserved for BPF.

Adjust the static register locations and "ABI" registers.
Note that packet length is packed with other info so we have
to extract it into one of the scratch registers, OTOH since
LMEM can be used in restricted operands we don't have to
extract packet pointer.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18e53b6c

nfp: bpf: encode extended LM pointer operands · 995e101f

Jakub Kicinski authored Oct 08, 2017

Most instructions have special fields which allow switching
between base and extended Local Memory pointers.  Introduce
those to register encoding, we will use the extra LM pointers
to access high addresses of the stack.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

995e101f

nfp: bpf: encode LMEM accesses · 9f15d0f4

Jakub Kicinski authored Oct 08, 2017

NFP LMEM is a large, indirectly accessed register file.  There
are two basic indirect access registers.  Each access operation
may either use offset (up to 8 or 16 words) or perform post
decrement/increment.

Add encodings of LMEM indexes as instruction operands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f15d0f4

nfp: add more white space to the instruction defines · 8afd9c96

Jakub Kicinski authored Oct 08, 2017

We need to add longer OP_* defines, move the values away.
Purely whitespace commit.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8afd9c96

nfp: bpf: remove packet marking support · 509144e2

Jakub Kicinski authored Oct 08, 2017

Temporarily drop support for skb->mark.  We are primarily focusing
on XDP offload, and implementing skb->mark on the new datapath has
lower priority.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

509144e2

nfp: bpf: remove register rename · 226e0e94

Jakub Kicinski authored Oct 08, 2017

Remove the register renumbering optimization.  To implement calling
map and other helpers we need more strict register layout.  We can't
freely reassign register numbers.

This will have the effect of running in 4 context/thread mode, which
should be OK since we are moving towards integrating the BPF closer
with FW app datapath anyway, and the target datapath itself runs in
4 context mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

226e0e94

nfp: bpf: encode all 64bit shifts · 3cae1319

Jakub Kicinski authored Oct 08, 2017

Add encodings of all 64bit shift operations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3cae1319

nfp: bpf: move software reg helpers and cmd table out of translator · 2a15bb1a

Jakub Kicinski authored Oct 08, 2017

Move the software reg helpers and some static data to nfp_asm.c.
They are related to the previous patch, but move is done in a separate
commit for ease of review.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a15bb1a

nfp: bpf: use the power of sparse to check we encode registers right · b3f868df

Jakub Kicinski authored Oct 08, 2017

Define a new __bitwise type for software representation of registers.
This will allow us to catch incorrect parameter types using sparse.

Accessors we define also allow us to return correct enum type and
therefore ensure all switches handle all register types.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3f868df

nfp: bpf: lift the single-port limitation · a52b35c3

Jakub Kicinski authored Oct 08, 2017

Limiting the eBPF offload to a single port was a workaround
required for the PoC application FW which has not been
released externally.  It's not necessary any more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a52b35c3

nfp: output control messages to trace_devlink_hwmsg() · 3a4b0129

Jakub Kicinski authored Oct 08, 2017

Use standard devlink trace point to allow tracing of control
messages.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a4b0129

Merge branch 'hns3-cleanups' · 1c3dc891

David S. Miller authored Oct 09, 2017

Yunsheng Lin says:

====================
A few cleanup for hns3 ethernet driver

This patchset contains a few cleanup for hns3 ethernet driver.
No functional change intended.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1c3dc891

net: hns3: Cleanup for non-static function in hns3 driver · 1db9b1bf

Yunsheng Lin authored Oct 09, 2017

This patch fixes the following warning from sparse:
warning: symbol 'hns3_set_multicast_list' was not declared.
Should it be static.

hns3_set_multicast_list turns out to be not used, so delete it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1db9b1bf

net: hns3: Cleanup for endian issue in hns3 driver · a90bb9a5

Yunsheng Lin authored Oct 09, 2017

This patch fixes a lot of endian issues detected by sparse.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a90bb9a5

net: hns3: Cleanup for struct that used to send cmd to firmware · d44f9b63

Yunsheng Lin authored Oct 09, 2017

The hclge_tm module has already added _cmd to the end of struct
that used to send cmd to firmware. This will help us finding the
endian issues.
This patch adds the _cmd to the end of struct that used to send
cmd to firmware in hclge_main module.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d44f9b63

net: hns3: Consistently using GENMASK in hns3 driver · 5392902d

Yunsheng Lin authored Oct 09, 2017

This patch uses GENMASK to generate bit mask whenever
possible in hns3 driver.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5392902d

net: hns3: Cleanup indentation for Kconfig in the the hisilicon folder · 56cf68c7

Yunsheng Lin authored Oct 09, 2017

This patch fixes a few indentation for Kconfig file in the
hisilicon folder.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56cf68c7

net: hns3: Add hns3_get_handle macro in hns3 driver · 9780cb97

Yunsheng Lin authored Oct 09, 2017

There are many places that will need to get the handle
of netdev, so add a macro to get the handle of netdev.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9780cb97

net: hns3: Cleanup for shifting true in hns3 driver · 5bca3b94

Yunsheng Lin authored Oct 09, 2017

This patch fixes a shifting true in hclge_main module.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5bca3b94

qed: Delete redundant check on dcb_app priority · c49c777f

Christos Gkekas authored Oct 08, 2017

dcb_app priority is unsigned thus checking whether it is less than zero
is redundant.
Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Acked-By: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c49c777f

net: ethernet: stmmac: Clean up dead code · c778c321

Christos Gkekas authored Oct 08, 2017

Many macros in dwmac-ipq806x are unused and should be removed.
Moreover gmac->id is an unsigned variable and therefore checking
whether it is less than zero is redundant.
Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c778c321

Merge branch 'ipv6_dev_get_saddr-rcu' · bf6a119e

David S. Miller authored Oct 08, 2017

Eric Dumazet says:

====================
ipv6: ipv6_dev_get_saddr() rcu works

Sending IPv6 udp packets on non connected sockets is quite slow,
because ipv6_dev_get_saddr() is still using an rwlock and silly
references games on ifa.

Tested:

$ ./super_netperf 16 -H 4444::555:0786 -l 2000 -t UDP_STREAM -- -m 100 &
[1] 12527

Performance is boosted from 2.02 Mpps to 4.28 Mpps

Kernel profile before patches :
  22.62%  [kernel]  [k] _raw_read_lock_bh
   7.04%  [kernel]  [k] refcount_sub_and_test
   6.56%  [kernel]  [k] ipv6_get_saddr_eval
   5.67%  [kernel]  [k] _raw_read_unlock_bh
   5.34%  [kernel]  [k] __ipv6_dev_get_saddr
   4.95%  [kernel]  [k] refcount_inc_not_zero
   4.03%  [kernel]  [k] __ip6addrlbl_match
   3.70%  [kernel]  [k] _raw_spin_lock
   3.44%  [kernel]  [k] ipv6_dev_get_saddr
   3.24%  [kernel]  [k] ip6_pol_route
   3.06%  [kernel]  [k] refcount_add_not_zero
   2.30%  [kernel]  [k] __local_bh_enable_ip
   1.81%  [kernel]  [k] mlx4_en_xmit
   1.20%  [kernel]  [k] __ip6_append_data
   1.12%  [kernel]  [k] __ip6_make_skb
   1.11%  [kernel]  [k] __dev_queue_xmit
   1.06%  [kernel]  [k] l3mdev_master_ifindex_rcu

Kernel profile after patches :
  11.36%  [kernel]  [k] ip6_pol_route
   7.65%  [kernel]  [k] _raw_spin_lock
   7.16%  [kernel]  [k] __ipv6_dev_get_saddr
   6.49%  [kernel]  [k] ipv6_get_saddr_eval
   6.04%  [kernel]  [k] refcount_add_not_zero
   3.34%  [kernel]  [k] __ip6addrlbl_match
   2.62%  [kernel]  [k] __dev_queue_xmit
   2.37%  [kernel]  [k] mlx4_en_xmit
   2.26%  [kernel]  [k] dst_release
   1.89%  [kernel]  [k] __ip6_make_skb
   1.87%  [kernel]  [k] __ip6_append_data
   1.86%  [kernel]  [k] udpv6_sendmsg
   1.86%  [kernel]  [k] ip6t_do_table
   1.64%  [kernel]  [k] ipv6_dev_get_saddr
   1.64%  [kernel]  [k] find_match
   1.51%  [kernel]  [k] l3mdev_master_ifindex_rcu
   1.24%  [kernel]  [k] ipv6_addr_label
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bf6a119e

ipv6: avoid cache line dirtying in ipv6_dev_get_saddr() · cc429c8f

Eric Dumazet authored Oct 07, 2017

By extending the rcu section a bit, we can avoid these
very expensive in6_ifa_put()/in6_ifa_hold() calls
done in __ipv6_dev_get_saddr() and ipv6_dev_get_saddr()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc429c8f

ipv6: __ipv6_dev_get_saddr() rcu conversion · f59c031e

Eric Dumazet authored Oct 07, 2017

Callers hold rcu_read_lock(), so we do not need
the rcu_read_lock()/rcu_read_unlock() pair.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f59c031e

ipv6: ipv6_chk_prefix() rcu conversion · 24ba333b

Eric Dumazet authored Oct 07, 2017

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24ba333b

ipv6: ipv6_chk_custom_prefix() rcu conversion · 47e26941

Eric Dumazet authored Oct 07, 2017

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47e26941

ipv6: ipv6_count_addresses() rcu conversion · d9bf82c2

Eric Dumazet authored Oct 07, 2017

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9bf82c2

ipv6: prepare RCU lookups for idev->addr_list · 8ef802aa

Eric Dumazet authored Oct 07, 2017

inet6_ifa_finish_destroy() already uses kfree_rcu() to free
inet6_ifaddr structs.

We need to use proper list additions/deletions in order
to allow readers to use RCU instead of idev->lock rwlock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ef802aa

Merge branch 'bridge-neigh-msg-proxy-and-flood-suppression-support' · a4231778

David S. Miller authored Oct 08, 2017

Roopa Prabhu says:

====================
bridge: neigh msg proxy and flood suppression support

This series implements arp and nd suppression in the bridge
driver for ethernet vpns. It implements rfc7432, section 10
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.

In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.

v2 : rebase to latest + address some optimization feedback from Nikolay.
v3 : fix kbuild reported build errors with CONFIG_INET off
v4 : simplify port flag mask as suggested by stephen
v5 : address some feedback from Toshiaki
v6 : some v5 cleanups in nd suppress (keep it consistent with arp suppress)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a4231778

bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports · ed842fae

Roopa Prabhu authored Oct 06, 2017

This patch avoids flooding and proxies ndisc packets
for BR_NEIGH_SUPPRESS ports.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed842fae

bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports · 057658cb

Roopa Prabhu authored Oct 06, 2017

This patch avoids flooding and proxies arp packets
for BR_NEIGH_SUPPRESS ports.

Moves existing br_do_proxy_arp to br_do_proxy_suppress_arp
to support both proxy arp and neigh suppress.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

057658cb

bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood · 821f1b21

Roopa Prabhu authored Oct 06, 2017

This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to
suppress arp and nd flood on bridge ports. It implements
rfc7432, section 10.
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.

In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.

This patch adds netlink and sysfs support to set this bridge port
flag.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

821f1b21

ipv6: fix a BUG in rt6_get_pcpu_route() · 951f788a

Eric Dumazet authored Oct 08, 2017

Ido reported following splat and provided a patch.

[  122.221814] BUG: using smp_processor_id() in preemptible [00000000] code: sshd/2672
[  122.221845] caller is debug_smp_processor_id+0x17/0x20
[  122.221866] CPU: 0 PID: 2672 Comm: sshd Not tainted 4.14.0-rc3-idosch-next-custom #639
[  122.221880] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  122.221893] Call Trace:
[  122.221919]  dump_stack+0xb1/0x10c
[  122.221946]  ? _atomic_dec_and_lock+0x124/0x124
[  122.221974]  ? ___ratelimit+0xfe/0x240
[  122.222020]  check_preemption_disabled+0x173/0x1b0
[  122.222060]  debug_smp_processor_id+0x17/0x20
[  122.222083]  ip6_pol_route+0x1482/0x24a0
...

I believe we can simplify this code path a bit, since we no longer
hold a read_lock and need to release it to avoid a dead lock.

By disabling BH, we make sure we'll prevent code re-entry and
rt6_get_pcpu_route()/rt6_make_pcpu_route() run on the same cpu.

Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

951f788a

Merge tag 'mlx5-updates-2017-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 51a0c00c

David S. Miller authored Oct 08, 2017

Saeed Mahameed says:

====================
Mellanox, mlx5 updates 2017-10-06

This series includes some shared code updates for kernel 4.15 to both
net-next and rdma-next trees.

The series includes mlx5 low level flow steering updates and optimizations
to support firmware command parallelism for flow steering requests from
Maor Gottlieb and two other small fixes from Matan and Maor.

One fix from Matan adds error handling for when the destination
list of the flow steering rule is full.

Maor introduced a patch to avoid NULL pointer dereference on steering cleanup.

Then Some refactoring patches needed by the series for code sharing purposes.
and split the Flow Table Entry (FTE) and Flow Group (FG) creation code to two parts:
    1) Object allocation - allocate the steering node and initialize
    its resources.

    2) The firmware command execution.

This change will give us the ability to take write lock on the
parent node (e.g. FG for FTE creating) only on the software data struct allocation
and creation part of the procedure where the synchronization is really required,
and will allow us to execute multiple firmware commands simultaneously and overcome the
firmware bottleneck.

Refactor the locking scheme of the mlx5 core flow steering as follows:

1) Replace the mutex lock with readers-writers semaphore and take
    the write lock only when necessary (e.g. allocating a new flow
    table entry index or adding a node to the parent's children list).
    When we try to find a suitable child in the parent's children list
    (e.g. search for flow group with the same match_criteria of the rule)
    then we only take the read lock.

2) Add versioning mechanism - each steering entity (FT, FG, FTE, DST)
    will have an incremental version. The version is increased when the
    entity is changed (e.g. when a new FTE was added to FG - the FG's
    version is increased).
    Versioning is used in order to determine if the last traverse of an
    entity's children is valid or a rescan under write lock is required.

Last patch adds FGs and FTEs memory pool, It is useful because these objects
are not small and could be allocated/deallocated many times.

This support improves the insertion rate of steering rules
from ~5k/sec to ~40k/sec.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

51a0c00c

08 Oct, 2017 4 commits

Merge branch 'hv_netvsc-TCP-hash-level' · 28f50eb2

David S. Miller authored Oct 08, 2017

Haiyang Zhang says:

====================
hv_netvsc: support changing TCP hash level

The patch set simplifies the existing hash level switching code for
UDP. It also adds the support for changing TCP hash level. So users
can switch between L3 an L4 hash levels for TCP and UDP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

28f50eb2

hv_netvsc: Update netvsc Document for TCP hash level setting · 78005d91

Haiyang Zhang authored Oct 06, 2017

Update Documentation/networking/netvsc.txt for TCP hash level setting
and related info.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78005d91

hv_netvsc: Add ethtool handler to set and get TCP hash levels · 0518ec4f

Haiyang Zhang authored Oct 06, 2017

The patch supports the options to switch TCP hash level between
L3 and L4 by ethtool command. TCP over IPv4 and v6 can be set
differently. The default hash level is L4. We currently only
allow switching TX hash level from within the guests.

For example, for TCP over IPv4 on eth0:
To include TCP port numbers in hashing:
	ethtool -N eth0 rx-flow-hash tcp4 sdfn
To exclude TCP port numbers in hashing:
	ethtool -N eth0 rx-flow-hash tcp4 sd
To show TCP hash level:
	ethtool -n eth0 rx-flow-hash tcp4
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0518ec4f

hv_netvsc: Change the hash level variable to bit flags · 486e3981

Haiyang Zhang authored Oct 06, 2017

This simplifies the logic and make it easier to add more
options.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

486e3981