Commits · fd734c5cca62b7630703244d3613be135d646a0e · Kirill Smelkov / linux

10 Nov, 2018 13 commits

libbpf: bpf_program__pin: add special case for instances.nr == 1 · fd734c5c

Stanislav Fomichev authored Nov 09, 2018

When bpf_program has only one instance, don't create a subdirectory with
per-instance pin files (<prog>/0). Instead, just create a single pin file
for that single instance. This simplifies object pinning by not creating
unnecessary subdirectories.

This can potentially break existing users that depend on the case
where '/0' is always created. However, I couldn't find any serious
usage of bpf_program__pin inside the kernel tree and I suppose there
should be none outside.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fd734c5c

libbpf: cleanup after partial failure in bpf_object__pin · 0c19a9fb

Stanislav Fomichev authored Nov 09, 2018

bpftool will use bpf_object__pin in the next commits to pin all programs
and maps from the file; in case of a partial failure, we need to get
back to the clean state (undo previous program/map pins).

As part of a cleanup, I've added and exported separate routines to
pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs)
of an object.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0c19a9fb

selftests/bpf: rename flow dissector section to flow_dissector · 108d50a9

Stanislav Fomichev authored Nov 09, 2018

Makes it compatible with the logic that derives program type
from section name in libbpf_prog_type_by_name.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

108d50a9

Merge branch 'device-ops-as-cb' · 0157edc8

Alexei Starovoitov authored Nov 10, 2018

Quentin Monnet says:

====================
For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since none
of those operations rely on RTNL, we can turn these three commands into
hooks inside the struct bpf_prog_offload_ops, and pass them as part of
bpf_offload_dev_create().

This patch set changes the offload architecture to do so, and brings the
relevant changes to the nfp and netdevsim drivers.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0157edc8

bpf: do not pass netdev to translate() and prepare() offload callbacks · 16a8cb5c

Quentin Monnet authored Nov 09, 2018

The kernel functions to prepare verifier and translate for offloaded
program retrieve "offload" from "prog", and "netdev" from "offload".
Then both "prog" and "netdev" are passed to the callbacks.

Simplify this by letting the drivers retrieve the net device themselves
from the offload object attached to prog - if they need it at all. There
is currently no need to pass the netdev as an argument to those
functions.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

16a8cb5c

bpf: pass prog instead of env to bpf_prog_offload_verifier_prep() · a40a2632

Quentin Monnet authored Nov 09, 2018

Function bpf_prog_offload_verifier_prep(), called from the kernel BPF
verifier to run a driver-specific callback for preparing for the
verification step for offloaded programs, takes a pointer to a struct
bpf_verifier_env object. However, no driver callback needs the whole
structure at this time: the two drivers supporting this, nfp and
netdevsim, only need a pointer to the struct bpf_prog instance held by
env.

Update the callback accordingly, on kernel side and in these two
drivers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a40a2632

bpf: pass destroy() as a callback and remove its ndo_bpf subcommand · eb911947

Quentin Monnet authored Nov 09, 2018

As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.

Remove function __bpf_offload_ndo(), which is no longer used.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

eb911947

bpf: pass translate() as a callback and remove its ndo_bpf subcommand · b07ade27

Quentin Monnet authored Nov 09, 2018

As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b07ade27

bpf: call verifier_prep from its callback in struct bpf_offload_dev · 00db12c3

Quentin Monnet authored Nov 09, 2018

In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.

Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

00db12c3

bpf: call finalize() from its callback in struct bpf_offload_dev · 6dc18fa6

Quentin Monnet authored Nov 09, 2018

In a way similar to the change previously brought to the verify_insn
hook, switch to the newly added ops in struct bpf_prog_offload for
calling the functions used to perform final verification steps for
offloaded programs.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6dc18fa6

bpf: call verify_insn from its callback in struct bpf_offload_dev · 341b3e7b

Quentin Monnet authored Nov 09, 2018

We intend to remove the dev_ops in struct bpf_prog_offload, and to only
keep the ops in struct bpf_offload_dev instead, which is accessible from
more locations for passing function pointers.

But dev_ops is used for calling the verify_insn hook. Switch to the
newly added ops in struct bpf_prog_offload instead.

To avoid table lookups for each eBPF instruction to verify, we remember
the offdev attached to a netdev and modify bpf_offload_find_netdev() to
avoid performing more than once a lookup for a given offload object.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

341b3e7b

bpf: pass a struct with offload callbacks to bpf_offload_dev_create() · 1385d755

Quentin Monnet authored Nov 09, 2018

For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since
none of those operations rely on RTNL, we can turn these three commands
into hooks inside the struct bpf_prog_offload_ops, and pass them as part
of bpf_offload_dev_create().

This commit effectively passes a pointer to the struct to
bpf_offload_dev_create(). We temporarily have two struct
bpf_prog_offload_ops instances, one under offdev->ops and one under
offload->dev_ops. The next patches will make the transition towards the
former, so that offload->dev_ops can be removed, and callbacks relying
on ndo_bpf() added to offdev->ops as well.

While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and
similarly for netdevsim).
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1385d755

nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c · 1da6f573

Quentin Monnet authored Nov 09, 2018

We are about to add several new callbacks to the struct, all of them
defined in offload.c. Move the struct bpf_prog_offload_ops object in
that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no
longer be static.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1da6f573

09 Nov, 2018 7 commits

bpf: Extend the sk_lookup() helper to XDP hookpoint. · c8123ead

Nitin Hande authored Oct 28, 2018

This patch proposes to extend the sk_lookup() BPF API to the
XDP hookpoint. The sk_lookup() helper supports a lookup
on incoming packet to find the corresponding socket that will
receive this packet. Current support for this BPF API is
at the tc hookpoint. This patch will extend this API at XDP
hookpoint. A XDP program can map the incoming packet to the
5-tuple parameter and invoke the API to find the corresponding
socket structure.
Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

c8123ead

bpftool: Improve handling of ENOENT on map dumps · bf598a8f

David Ahern authored Nov 08, 2018

bpftool output is not user friendly when dumping a map with only a few
populated entries:

    $ bpftool map
    1: devmap  name tx_devmap  flags 0x0
            key 4B  value 4B  max_entries 64  memlock 4096B
    2: array  name tx_idxmap  flags 0x0
            key 4B  value 4B  max_entries 64  memlock 4096B

    $ bpftool map dump id 1
    key:
    00 00 00 00
    value:
    No such file or directory
    key:
    01 00 00 00
    value:
    No such file or directory
    key:
    02 00 00 00
    value:
    No such file or directory
    key: 03 00 00 00  value: 03 00 00 00

Handle ENOENT by keeping the line format sane and dumping
"<no entry>" for the value

    $ bpftool map dump id 1
    key: 00 00 00 00  value: <no entry>
    key: 01 00 00 00  value: <no entry>
    key: 02 00 00 00  value: <no entry>
    key: 03 00 00 00  value: 03 00 00 00
    ...
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bf598a8f

selftests/bpf: add a test case for sock_ops perf-event notification · 435f90a3

Sowmini Varadhan authored Nov 07, 2018

This patch provides a tcp_bpf based eBPF sample. The test

- ncat(1) as the TCP client program to connect() to a port
  with the intention of triggerring SYN retransmissions: we
  first install an iptables DROP rule to make sure ncat SYNs are
  resent (instead of aborting instantly after a TCP RST)

- has a bpf kernel module that sends a perf-event notification for
  each TCP retransmit, and also tracks the number of such notifications
  sent in the global_map

The test passes when the number of event notifications intercepted
in user-space matches the value in the global_map.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

435f90a3

bpf: add perf event notificaton support for sock_ops · a5a3a828

Sowmini Varadhan authored Nov 07, 2018

This patch allows eBPF programs that use sock_ops to send perf
based event notifications using bpf_perf_event_output(). Our main
use case for this is the following:

  We would like to monitor some subset of TCP sockets in user-space,
  (the monitoring application would define 4-tuples it wants to monitor)
  using TCP_INFO stats to analyze reported problems. The idea is to
  use those stats to see where the bottlenecks are likely to be ("is
  it application-limited?" or "is there evidence of BufferBloat in
  the path?" etc).

  Today we can do this by periodically polling for tcp_info, but this
  could be made more efficient if the kernel would asynchronously
  notify the application via tcp_info when some "interesting"
  thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
  are reached. And to make this effective, it is better if
  we could apply the threshold check *before* constructing the
  tcp_info netlink notification, so that we don't waste resources
  constructing notifications that will be discarded by the filter.

This work solves the problem by adding perf event based notification
support for sock_ops. The eBPF program can thus be designed to apply
any desired filters to the bpf_sock_ops and trigger a perf event
notification based on the evaluation from the filter. The user space
component can use these perf event notifications to either read any
state managed by the eBPF program, or issue a TCP_INFO netlink call
if desired.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a5a3a828

Merge branch 'bpf-max-pkt-offset' · 185067a8

Daniel Borkmann authored Nov 09, 2018

Jiong Wang says:

====================
The maximum packet offset accessed by one BPF program is useful
information.

Because sometimes there could be packet split and it is possible for some
reasons (for example performance) we want to reject the BPF program if the
maximum packet size would trigger such split. Normally, MTU value is
treated as the maximum packet size, but one BPF program does not always
access the whole packet, it could only access the head portion of the data.

We could let verifier calculate the maximum packet offset ever used and
record it inside prog auxiliar information structure as a new field
"max_pkt_offset".
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

185067a8

nfp: bpf: relax prog rejection through max_pkt_offset · cf599f50

Jiong Wang authored Nov 08, 2018

NFP is refusing to offload programs whenever the MTU is set to a value
larger than the max packet bytes that fits in NFP Cluster Target Memory
(CTM). However, a eBPF program doesn't always need to access the whole
packet data.

Verifier has always calculated maximum direct packet access (DPA) offset,
and kept it in max_pkt_offset inside prog auxiliar information. This patch
relax prog rejection based on max_pkt_offset.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cf599f50

bpf: let verifier to calculate and record max_pkt_offset · e647815a

Jiong Wang authored Nov 08, 2018

In check_packet_access, update max_pkt_offset after the offset has passed
__check_packet_access.

It should be safe to use u32 for max_pkt_offset as explained in code
comment.

Also, when there is tail call, the max_pkt_offset of the called program is
unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such
case.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e647815a

07 Nov, 2018 20 commits

bpf_load: add map name to load_maps error message · bce6a149

Shannon Nelson authored Oct 29, 2018

To help when debugging bpf/xdp load issues, have the load_map()
error message include the number and name of the map that
failed.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bce6a149

tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps · 8302b9bd

Quentin Monnet authored Nov 07, 2018

The limit for memory locked in the kernel by a process is usually set to
64 kbytes by default. This can be an issue when creating large BPF maps
and/or loading many programs. A workaround is to raise this limit for
the current process before trying to create a new BPF map. Changing the
hard limit requires the CAP_SYS_RESOURCE and can usually only be done by
root user (for non-root users, a call to setrlimit fails (and sets
errno) and the program simply goes on with its rlimit unchanged).

There is no API to get the current amount of memory locked for a user,
therefore we cannot raise the limit only when required. One solution,
used by bcc, is to try to create the map, and on getting a EPERM error,
raising the limit to infinity before giving another try. Another
approach, used in iproute2, is to raise the limit in all cases, before
trying to create the map.

Here we do the same as in iproute2: the rlimit is raised to infinity
before trying to load programs or to create maps with bpftool.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

8302b9bd

selftests/bpf: enable (uncomment) all tests in test_libbpf.sh · f96afa76

Quentin Monnet authored Nov 07, 2018

libbpf is now able to load successfully test_l4lb_noinline.o and
samples/bpf/tracex3_kern.o.

For the test_l4lb_noinline, uncomment related tests from test_libbpf.c
and remove the associated "TODO".

For tracex3_kern.o, instead of loading a program from samples/bpf/ that
might not have been compiled at this stage, try loading a program from
BPF selftests. Since this test case is about loading a program compiled
without the "-target bpf" flag, change the Makefile to compile one
program accordingly (instead of passing the flag for compiling all
programs).

Regarding test_xdp_noinline.o: in its current shape the program fails to
load because it provides no version section, but the loader needs one.
The test was added to make sure that libbpf could load XDP programs even
if they do not provide a version number in a dedicated section. But
libbpf is already capable of doing that: in our case loading fails
because the loader does not know that this is an XDP program (it does
not need to, since it does not attach the program). So trying to load
test_xdp_noinline.o does not bring much here: just delete this subtest.

For the record, the error message obtained with tracex3_kern.o was
fixed by commit e3d91b0c ("tools/libbpf: handle issues with bpf ELF
objects containing .eh_frames")

I have not been abled to reproduce the "libbpf: incorrect bpf_call
opcode" error for test_l4lb_noinline.o, even with the version of libbpf
present at the time when test_libbpf.sh and test_libbpf_open.c were
created.

RFC -> v1:
- Compile test_xdp without the "-target bpf" flag, and try to load it
  instead of ../../samples/bpf/tracex3_kern.o.
- Delete test_xdp_noinline.o subtest.

Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f96afa76

net: hns3: Remove set but not used variable 'reset_level' · f601a85b

YueHaibing authored Nov 07, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c: In function 'hclge_log_and_clear_ppp_error':
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c:821:24: warning:
variable 'reset_level' set but not used [-Wunused-but-set-variable]
enum hnae3_reset_type reset_level = HNAE3_NONE_RESET;

It never used since introduction in commit
01865a50 ("net: hns3: Add enable and process hw errors of TM scheduler")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f601a85b

Merge branch 'nfp-more-set-actions-and-notifier-refactor' · 75790a74

David S. Miller authored Nov 07, 2018

Jakub Kicinski says:

====================
nfp: more set actions and notifier refactor

This series brings updates to flower offload code.  First Pieter adds
support for setting TTL, ToS, Flow Label and Hop Limit fields in IPv4
and IPv6 headers.

Remaining 5 patches deal with factoring out netdev notifiers from flower
code.  We already have two instances, and more is coming, so it's time
to move to one central notifier which then feeds individual feature
handlers.

I start that part by cleaning up the existing notifiers.  Next a central
notifier is added, and used by flower offloads.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

75790a74

nfp: flower: use the common netdev notifier · 0c665e2b

Jakub Kicinski authored Nov 06, 2018

Use driver's common notifier for LAG and tunnel configuration.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c665e2b

nfp: register a notifier handler in a central location for the device · 3e333590

Jakub Kicinski authored Nov 06, 2018

Code interested in networking events registers its own notifier
handlers.  Create one device-wide notifier instance.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e333590

nfp: flower: make nfp_fl_lag_changels_event() void · 659bb404

Jakub Kicinski authored Nov 06, 2018

nfp_fl_lag_changels_event() never fails, and therefore we would
never return NOTIFY_BAD for NETDEV_CHANGELOWERSTATE.  Make this
clearer by changing nfp_fl_lag_changels_event()'s return type
to void.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

659bb404

nfp: flower: don't try to nack device unregister events · a558c982

Jakub Kicinski authored Nov 06, 2018

Returning an error from a notifier means we want to veto the change.
We shouldn't veto NETDEV_UNREGISTER just because we couldn't find
the tracking info for given master.

I can't seem to find a way to trigger this unless we have some
other bug, so it's probably not fix-worthy.

While at it move the checking if the netdev really is of interest
into the handling functions, like we do for other events.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a558c982

nfp: flower: remove unnecessary iteration over devices · e50bfdf7

Jakub Kicinski authored Nov 06, 2018

For flower tunnel offloads FW has to be informed about MAC addresses
of tunnel devices.  We use a netdev notifier to keep track of these
addresses.

Remove unnecessary loop over netdevices after notifier is registered.
The intention of the loop was to catch devices which already existed
on the system before nfp driver got loaded, but netdev notifier will
replay NETDEV_REGISTER events.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e50bfdf7

nfp: flower: add ipv6 set flow label and hop limit offload · 4234d62c

Pieter Jansen van Vuuren authored Nov 06, 2018

Add ipv6 set flow label and hop limit action offload. Since pedit sets
headers per 4 byte word, we need to ensure that setting either version,
priority, payload_len or nexthdr does not get offloaded.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4234d62c

nfp: flower: add ipv4 set ttl and tos offload · a3c6b063

Pieter Jansen van Vuuren authored Nov 06, 2018

Add ipv4 set ttl and tos action offload. Since pedit sets headers per 4
byte word, we need to ensure that setting either version, ihl, protocol,
total length or checksum does not get offloaded.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3c6b063

Merge branch 'hns3-next' · 6a02d1fa

David S. Miller authored Nov 07, 2018

Huazhong Tan says:

====================
hns3: provide new interfaces & bugfixes & code optimization

This patchset provides some reset interfaces for RAS & RoCE, also
some bugfixes and optimization related to reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6a02d1fa

net: hns3: fix for cmd queue memory not freed problem during reset · 8b0195a3

Huazhong Tan authored Nov 07, 2018

It is not necessary to reallocate the descriptor and remap the
descriptor memory in reset process, otherwise it may cause memory
not freed problem.

Also, this patch initializes the cmd queue's spinlocks in
hclgevf_alloc_cmd_queue, and take the spinlocks when reinitializing
cmd queue' registers.

Fixes: fedd0c15 ("net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b0195a3

net: hns3: add error handler for hclge_reset() · 65e41e7e

Huazhong Tan authored Nov 07, 2018

When hclge_reset() is called, it may fail for several reasons.
For example, an higher-level reset event occurs, memory allocation
failure, hardware reset timeout, etc. Therefore, it is necessary
to add corresponding error handling for these situations.
1. A high-level reset is required due to a high-level reset failure.
2. For memory allocation failure, a high-level reset is initiated by
the timer to recover. The reason for using the timer is to prevent this
new high-level reset to interrupt the reset process of other pf/vf;
3. For the case of hardware reset timeout, reschedule the reset task
to wait for the hardware to complete the reset.
For memory allocation failure and reset timeouts, in order to prevent
an infinite number of scheduled reset tasks, the number of error
recovery needs to be limited.

This patch also add some reset related debug log printing.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65e41e7e

net: hns3: call roce's reset notify callback when resetting · f403a84f

Huazhong Tan authored Nov 07, 2018

While doing resetting, roce should do its uninitailization part
before nic's, and do its initialization part after nic's.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f403a84f

net: hns3: adjust the process of PF reset · 35d93a30

Huazhong Tan authored Nov 07, 2018

When doing PF reset, the driver needs to do some preparatory work
before asserting PF reset. Since when hardware is resetting, it
is necessary to stop tx/rx queue, clear hardware table, etc,
otherwise hardware may run into unrecoverable state if there is
still IO running when the hardware is resetting.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35d93a30

net: hns3: move some reset information from hnae3_handle into hclge_dev/hclgevf_dev · 0742ed7c

Huazhong Tan authored Nov 07, 2018

Saving reset related information in the hclge_dev/hclgevf_dev
structure is more suitable than the hnae3_handle, since hardware
related information is kept in these two structure.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0742ed7c

net: hns3: ignore new coming low-level reset while doing high-level reset · 7cea834d

Huazhong Tan authored Nov 07, 2018

When processing a higher level reset, the pending lower level reset
does not have to be processed anymore, because the higher level
reset is the superset of the lower level reset.

Therefore, when processing an higher level reset, the request of
lower level reset needs to be cleared.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7cea834d

net: hns3: use HNS3_NIC_STATE_RESETTING to indicate resetting · 257e4f29

Huazhong Tan authored Nov 07, 2018

While hclge is going to reset, it will notify its client with
HNAE3_DOWN_CLIENT, so this client should get into a resetting
status from this moment, other operations from the stack need to
be blocked as well. And when the reset is finished, the client
will be notified with HNAE3_UP_CLIENT, so this is the end of
the resetting status.

This patch uses HNS3_NIC_STATE_RESETTING flag to implement that,
and adds hns3_nic_resetting() to indicate which operation is not
allowed.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

257e4f29