Commits · 8968c67a82ab7501bc3b9439c3624a49b42fe54c · nexedi / linux

27 Apr, 2019 1 commit

bpf, arm64: remove prefetch insn in xadd mapping · 8968c67a

Daniel Borkmann authored Apr 26, 2019

Prefetch-with-intent-to-write is currently part of the XADD mapping in
the AArch64 JIT and follows the kernel's implementation of atomic_add.
This may interfere with other threads executing the LDXR/STXR loop,
leading to potential starvation and fairness issues. Drop the optional
prefetch instruction.

Fixes: 85f68fe8 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

8968c67a

26 Apr, 2019 6 commits

Merge branch 'btf-dump' · 0c0cad2c

Alexei Starovoitov authored Apr 25, 2019

Andrii Nakryiko says:

====================
This patch set adds a new `bpftool btf dump` sub-command, which allows to dump
BTF contents (only types for now). Currently it only outputs low-level
content, almost 1:1 with binary BTF format, but follow up patches will add
ability to dump BTF types as a compilable C header file. JSON output is
supported as well.

Patch #1 adds `btf` sub-command, dumping BTF types in human-readable format.
It also implements reading .BTF data from ELF file.
Patch #2 adds minimal documentation with output format examples and different
ways to specify source of BTF data.
Patch #3 adds support for btf command in bash-completion/bpftool script.
Patch #4 fixes minor indentation issue in bash-completion script.

Output format is mostly following existing format of BPF verifier log, but
deviates from it in few places. More details are in commit message for patch 1.

Example of output for all supported BTF kinds are in patch #2 as part of
documentation. Some field names are quite verbose and I'd rather shorten them,
if we don't feel like being very close to BPF verifier names is a necessity,
but in this patch I left them exactly the same as in verifier log.

v3->v4:
  - reverse Christmas tree (Quentin)
  - better docs (Quentin)

v2->v3:
  - make map's key|value|kv|all suggestion more precise (Quentin)
  - fix default case indentations (Quentin)

v1->v2:
  - fix unnecessary trailing whitespaces in bpftool-btf.rst (Yonghong)
  - add btf in main.c for a list of possible OBJECTs
  - handle unknown keyword under `bpftool btf dump` (Yonghong)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0c0cad2c

bpftool: fix indendation in bash-completion/bpftool · 8ed1875b

Andrii Nakryiko authored Apr 25, 2019

Fix misaligned default case branch for `prog dump` sub-command.
Reported-by: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Yonghong Song <yhs@fb.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

8ed1875b

bpftool: add bash completions for btf command · 4a714fee

Andrii Nakryiko authored Apr 25, 2019

Add full support for btf command in bash-completion script.

Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4a714fee

bpftool/docs: add btf sub-command documentation · ca253339

Andrii Nakryiko authored Apr 25, 2019

Document usage and sample output format for `btf dump` sub-command.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ca253339

bpftool: add ability to dump BTF types · c93cc690

Andrii Nakryiko authored Apr 25, 2019

Add new `btf dump` sub-command to bpftool. It allows to dump
human-readable low-level BTF types representation of BTF types. BTF can
be retrieved from few different sources:
  - from BTF object by ID;
  - from PROG, if it has associated BTF;
  - from MAP, if it has associated BTF data; it's possible to narrow
    down types to either key type, value type, both, or all BTF types;
  - from ELF file (.BTF section).

Output format mostly follows BPF verifier log format with few notable
exceptions:
  - all the type/field/param/etc names are enclosed in single quotes to
    allow easier grepping and to stand out a little bit more;
  - FUNC_PROTO output follows STRUCT/UNION/ENUM format of having one
    line per each argument; this is more uniform and allows easy
    grepping, as opposed to succinct, but inconvenient format that BPF
    verifier log is using.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c93cc690

bpftool: Fix errno variable usage · 77d76426

Benjamin Poirier authored Apr 11, 2019

The test meant to use the saved value of errno. Given the current code, it
makes no practical difference however.

Fixes: bf598a8f ("bpftool: Improve handling of ENOENT on map dumps")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

77d76426

25 Apr, 2019 7 commits

bpftool: show flow_dissector attachment status · 7f0c57fe

Stanislav Fomichev authored Apr 25, 2019

Right now there is no way to query whether BPF flow_dissector program
is attached to a network namespace or not. In previous commit, I added
support for querying that info, show it when doing `bpftool net`:

$ bpftool prog loadall ./bpf_flow.o \
	/sys/fs/bpf/flow type flow_dissector \
	pinmaps /sys/fs/bpf/flow
$ bpftool prog
3: flow_dissector  name _dissect  tag 8c9e917b513dd5cc  gpl
        loaded_at 2019-04-23T16:14:48-0700  uid 0
        xlated 656B  jited 461B  memlock 4096B  map_ids 1,2
        btf_id 1
...

$ bpftool net -j
[{"xdp":[],"tc":[],"flow_dissector":[]}]

$ bpftool prog attach pinned \
	/sys/fs/bpf/flow/flow_dissector flow_dissector
$ bpftool net -j
[{"xdp":[],"tc":[],"flow_dissector":["id":3]}]

Doesn't show up in a different net namespace:
$ ip netns add test
$ ip netns exec test bpftool net -j
[{"xdp":[],"tc":[],"flow_dissector":[]}]

Non-json output:
$ bpftool net
xdp:

tc:

flow_dissector:
id 3

v2:
* initialization order (Jakub Kicinski)
* clear errno for batch mode (Quentin Monnet)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7f0c57fe

bpf: support BPF_PROG_QUERY for BPF_FLOW_DISSECTOR attach_type · 118c8e9a

Stanislav Fomichev authored Apr 25, 2019

target_fd is target namespace. If there is a flow dissector BPF program
attached to that namespace, its (single) id is returned.

v5:
* drop net ref right after rcu unlock (Daniel Borkmann)

v4:
* add missing put_net (Jann Horn)

v3:
* add missing inline to skb_flow_dissector_prog_query static def
  (kbuild test robot <lkp@intel.com>)

v2:
* don't sleep in rcu critical section (Jakub Kicinski)
* check input prog_cnt (exit early)

Cc: Jann Horn <jannh@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

118c8e9a

samples: bpf: add hbm sample to .gitignore · ead442a0

Daniel T. Lee authored Apr 25, 2019

This commit adds hbm to .gitignore which is
currently ommited from the ignore file.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ead442a0

libbpf: fix samples/bpf build failure due to undefined UINT32_MAX · 32e621e5

Daniel T. Lee authored Apr 24, 2019

Currently, building bpf samples will cause the following error.

    ./tools/lib/bpf/bpf.h:132:27: error: 'UINT32_MAX' undeclared here (not in a function) ..
     #define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */
                               ^
    ./samples/bpf/bpf_load.h:31:25: note: in expansion of macro 'BPF_LOG_BUF_SIZE'
     extern char bpf_log_buf[BPF_LOG_BUF_SIZE];
                             ^~~~~~~~~~~~~~~~

Due to commit 4519efa6 ("libbpf: fix BPF_LOG_BUF_SIZE off-by-one error")
hard-coded size of BPF_LOG_BUF_SIZE has been replaced with UINT32_MAX which is
defined in <stdint.h> header.

Even with this change, bpf selftests are running fine since these are built
with clang and it includes header(-idirafter) from clang/6.0.0/include.
(it has <stdint.h>)

    clang -I. -I./include/uapi -I../../../include/uapi -idirafter /usr/local/include -idirafter /usr/include \
    -idirafter /usr/lib/llvm-6.0/lib/clang/6.0.0/include -idirafter /usr/include/x86_64-linux-gnu \
    -Wno-compare-distinct-pointer-types -O2 -target bpf -emit-llvm -c progs/test_sysctl_prog.c -o - | \
    llc -march=bpf -mcpu=generic  -filetype=obj -o /linux/tools/testing/selftests/bpf/test_sysctl_prog.o

But bpf samples are compiled with GCC, and it only searches and includes
headers declared at the target file. As '#include <stdint.h>' hasn't been
declared in tools/lib/bpf/bpf.h, it causes build failure of bpf samples.

    gcc -Wp,-MD,./samples/bpf/.sockex3_user.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes \
    -O2 -fomit-frame-pointer -std=gnu89 -I./usr/include -I./tools/lib/ -I./tools/testing/selftests/bpf/ \
    -I./tools/  lib/ -I./tools/include -I./tools/perf -c -o ./samples/bpf/sockex3_user.o ./samples/bpf/sockex3_user.c;

This commit add declaration of '#include <stdint.h>' to tools/lib/bpf/bpf.h
to fix this problem.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

32e621e5

Merge branch 'libbpf-fixes' · 0e33d334

Alexei Starovoitov authored Apr 25, 2019

Daniel Borkmann says:

====================
Two small fixes in relation to global data handling. Thanks!
====================
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0e33d334

bpf, libbpf: fix segfault in bpf_object__init_maps' pr_debug statement · 4f8827d2

Daniel Borkmann authored Apr 24, 2019

Ran into it while testing; in bpf_object__init_maps() data can be NULL
in the case where no map section is present. Therefore we simply cannot
access data->d_size before NULL test. Move the pr_debug() where it's
safe to access.

Fixes: d859900c ("bpf, libbpf: support global data/bss/rodata sections")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4f8827d2

bpf, libbpf: handle old kernels more graceful wrt global data sections · 8837fe5d

Daniel Borkmann authored Apr 24, 2019

Andrii reported a corner case where e.g. global static data is present
in the BPF ELF file in form of .data/.bss/.rodata section, but without
any relocations to it. Such programs could be loaded before commit
d859900c ("bpf, libbpf: support global data/bss/rodata sections"),
whereas afterwards if kernel lacks support then loading would fail.

Add a probing mechanism which skips setting up libbpf internal maps
in case of missing kernel support. In presence of relocation entries,
we abort the load attempt.

Fixes: d859900c ("bpf, libbpf: support global data/bss/rodata sections")
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

8837fe5d

23 Apr, 2019 26 commits

Merge branch 'bpf-proto-fixes' · a21b48a2

Daniel Borkmann authored Apr 24, 2019

Willem de Bruijn says:

====================
Expand the tc tunnel encap support with protocols that convert the
network layer protocol, such as 6in4. This is analogous to existing
support in bpf_skb_proto_6_to_4.

Patch 1 implements the straightforward logic
Patch 2 tests it with a 6in4 tunnel

Changes v1->v2
  - improve documentation in test
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a21b48a2

selftests/bpf: expand test_tc_tunnel with SIT encap · f6ad6acc

Willem de Bruijn authored Apr 23, 2019

So far, all BPF tc tunnel testcases encapsulate in the same network
protocol. Add an encap testcase that requires updating skb->protocol.

The 6in4 tunnel encapsulates an IPv6 packet inside an IPv4 tunnel.
Verify that bpf_skb_net_grow correctly updates skb->protocol to
select the right protocol handler in __netif_receive_skb_core.

The BPF program should also manually update the link layer header to
encode the right network protocol.

Changes v1->v2
  - improve documentation of non-obvious logic
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f6ad6acc

bpf: update skb->protocol in bpf_skb_net_grow · 1b00e0df

Willem de Bruijn authored Apr 23, 2019

Some tunnels, like sit, change the network protocol of packet.
If so, update skb->protocol to match the new type.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1b00e0df

Merge branch 'bpf-eth-get-headlen' · 2aad3261

Daniel Borkmann authored Apr 23, 2019

Stanislav Fomichev says:

====================
Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
skb. Because we use passed skb to lookup associated networking namespace
to find whether we have a BPF program attached or not, we always use
C-based flow dissector in this case.

The goal of this patch series is to add new networking namespace argument
to the eth_get_headlen and make BPF flow dissector programs be able to
work in the skb-less case.

The series goes like this:
* use new kernel context (struct bpf_flow_dissector) for flow dissector
  programs; this makes it easy to distinguish between skb and no-skb
  case and supports calling BPF flow dissector on a chunk of raw data
* convert BPF_PROG_TEST_RUN to use raw data
* plumb network namespace into __skb_flow_dissect from all callers
* handle no-skb case in __skb_flow_dissect
* update eth_get_headlen to include net namespace argument and
  convert all existing users
* add selftest to make sure bpf_skb_load_bytes is not allowed in
  the no-skb mode
* extend test_progs to exercise skb-less flow dissection as well
* stop adjusting nhoff/thoff by ETH_HLEN in BPF_PROG_TEST_RUN

v6:
* more suggestions by Alexei:
  * eth_get_headlen now takes net dev, not net namespace
  * test skb-less case via tun eth_get_headlen
* fix return errors in bpf_flow_load
* don't adjust nhoff/thoff by ETH_HLEN

v5:
* API changes have been submitted via bpf/stable tree

v4:
* prohibit access to vlan fields as well (otherwise, inconsistent
  between skb/skb-less cases)
* drop extra unneeded check for skb->vlan_present in bpf_flow.c

v3:
* new kernel xdp_buff-like context per Alexei suggestion
* drop skb_net helper
* properly clamp flow_keys->nhoff

v2:
* moved temporary skb from stack into percpu (avoids memset of ~200 bytes
  per packet)
* tightened down access to __sk_buff fields from flow dissector programs to
  avoid touching shinfo (whitelist only relevant fields)
* addressed suggestions from Willem
====================
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

2aad3261

bpf/flow_dissector: don't adjust nhoff by ETH_HLEN in BPF_PROG_TEST_RUN · 02ee0658

Stanislav Fomichev authored Apr 22, 2019

Now that we use skb-less flow dissector let's return true nhoff and
thoff. We used to adjust them by ETH_HLEN because that's how it was
done in the skb case. For VLAN tests that looks confusing: nhoff is
pointing to vlan parts :-\

Warning, this is an API change for BPF_PROG_TEST_RUN! Feel free to drop
if you think that it's too late at this point to fix it.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

02ee0658

selftests/bpf: properly return error from bpf_flow_load · fe993c64

Stanislav Fomichev authored Apr 22, 2019

Right now we incorrectly return 'ret' which is always zero at that
point.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

fe993c64

selftests/bpf: run flow dissector tests in skb-less mode · 0905beec

Stanislav Fomichev authored Apr 22, 2019

Export last_dissection map from flow dissector and use a known place in
tun driver to trigger BPF flow dissection.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

0905beec

selftests/bpf: add flow dissector bpf_skb_load_bytes helper test · c9cb2c1e

Stanislav Fomichev authored Apr 22, 2019

When flow dissector is called without skb, we want to make sure
bpf_skb_load_bytes invocations return error. Add small test which tries
to read single byte from a packet.

bpf_skb_load_bytes should always fail under BPF_PROG_TEST_RUN because
it was converted to the skb-less mode.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

c9cb2c1e

net: pass net_device argument to the eth_get_headlen · c43f1255

Stanislav Fomichev authored Apr 22, 2019

Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.

Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

c43f1255

flow_dissector: handle no-skb use case · 9b52e3f2

Stanislav Fomichev authored Apr 22, 2019

When called without skb, gather all required data from the
__skb_flow_dissect's arguments and use recently introduces
no-skb mode of bpf flow dissector.

Note: WARN_ON_ONCE(!net) will now trigger for eth_get_headlen users.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9b52e3f2

net: plumb network namespace into __skb_flow_dissect · 3cbf4ffb

Stanislav Fomichev authored Apr 22, 2019

This new argument will be used in the next patches for the
eth_get_headlen use case. eth_get_headlen calls flow dissector
with only data (without skb) so there is currently no way to
pull attached BPF flow dissector program. With this new argument,
we can amend the callers to explicitly pass network namespace
so we can use attached BPF program.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3cbf4ffb

bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode · 7b8a1304

Stanislav Fomichev authored Apr 22, 2019

Now that we have bpf_flow_dissect which can work on raw data,
use it when doing BPF_PROG_TEST_RUN for flow dissector.

Simplifies bpf_prog_test_run_flow_dissector and allows us to
test no-skb mode.

Note, that previously, with bpf_flow_dissect_skb we used to call
eth_type_trans which pulled L2 (ETH_HLEN) header and we explicitly called
skb_reset_network_header. That means flow_keys->nhoff would be
initialized to 0 (skb_network_offset) in init_flow_keys.
Now we call bpf_flow_dissect with nhoff set to ETH_HLEN and need
to undo it once the dissection is done to preserve the existing behavior.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7b8a1304

flow_dissector: switch kernel context to struct bpf_flow_dissector · 089b19a9

Stanislav Fomichev authored Apr 22, 2019

struct bpf_flow_dissector has a small subset of sk_buff fields that
flow dissector BPF program is allowed to access and an optional
pointer to real skb. Real skb is used only in bpf_skb_load_bytes
helper to read non-linear data.

The real motivation for this is to be able to call flow dissector
from eth_get_headlen context where we don't have an skb and need
to dissect raw bytes.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

089b19a9

net: systemport: Remove need for DMA descriptor · 7e6e185c

Florian Fainelli authored Apr 22, 2019

All we do is write the length/status and address bits to a DMA
descriptor only to write its contents into on-chip registers right
after, eliminate this unnecessary step.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e6e185c

bridge: Fix possible use-after-free when deleting bridge port · 697cd36c

Ido Schimmel authored Apr 22, 2019

When a bridge port is being deleted, do not dereference it later in
br_vlan_port_event() as it can result in a use-after-free [1] if the RCU
callback was executed before invoking the function.

[1]
[  129.638551] ==================================================================
[  129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd
[  129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483
[  129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383
[  129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  129.682484] Call Trace:
[  129.685242]  dump_stack+0xa9/0x10e
[  129.689068]  print_address_description.cold.2+0x9/0x25e
[  129.694930]  kasan_report.cold.3+0x78/0x9d
[  129.704420]  br_vlan_port_event+0x53c/0x5fd
[  129.728300]  br_device_event+0x2c7/0x7a0
[  129.741505]  notifier_call_chain+0xb5/0x1c0
[  129.746202]  rollback_registered_many+0x895/0xe90
[  129.793119]  unregister_netdevice_many+0x48/0x210
[  129.803384]  rtnl_delete_link+0xe1/0x140
[  129.815906]  rtnl_dellink+0x2a3/0x820
[  129.844166]  rtnetlink_rcv_msg+0x397/0x910
[  129.868517]  netlink_rcv_skb+0x137/0x3a0
[  129.882013]  netlink_unicast+0x49b/0x660
[  129.900019]  netlink_sendmsg+0x755/0xc90
[  129.915758]  ___sys_sendmsg+0x761/0x8e0
[  129.966315]  __sys_sendmsg+0xf0/0x1c0
[  129.988918]  do_syscall_64+0xa4/0x470
[  129.993032]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  129.998696] RIP: 0033:0x7ff578104b58
...
[  130.073811] Allocated by task 479:
[  130.077633]  __kasan_kmalloc.constprop.5+0xc1/0xd0
[  130.083008]  kmem_cache_alloc_trace+0x152/0x320
[  130.088090]  br_add_if+0x39c/0x1580
[  130.092005]  do_set_master+0x1aa/0x210
[  130.096211]  do_setlink+0x985/0x3100
[  130.100224]  __rtnl_newlink+0xc52/0x1380
[  130.104625]  rtnl_newlink+0x6b/0xa0
[  130.108541]  rtnetlink_rcv_msg+0x397/0x910
[  130.113136]  netlink_rcv_skb+0x137/0x3a0
[  130.117538]  netlink_unicast+0x49b/0x660
[  130.121939]  netlink_sendmsg+0x755/0xc90
[  130.126340]  ___sys_sendmsg+0x761/0x8e0
[  130.130645]  __sys_sendmsg+0xf0/0x1c0
[  130.134753]  do_syscall_64+0xa4/0x470
[  130.138864]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  130.146195] Freed by task 0:
[  130.149421]  __kasan_slab_free+0x125/0x170
[  130.154016]  kfree+0xf3/0x310
[  130.157349]  kobject_put+0x1a8/0x4c0
[  130.161363]  rcu_core+0x859/0x19b0
[  130.165175]  __do_softirq+0x250/0xa26
[  130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8
                which belongs to the cache kmalloc-1k of size 1024
[  130.184972] The buggy address is located 0 bytes inside of
                1024-byte region [ffff8881e4aa1ae8, ffff8881e4aa1ee8)

Fixes: 9c0ec2e7 ("bridge: support binding vlan dev link state to vlan member bridge ports")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Cc: Mike Manning <mmanning@vyatta.att-mail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

697cd36c

r8152: sync sa_family with the media type of network device · a6cbcb77

Crag.Wang authored Apr 22, 2019

Without this patch the socket address family sporadically gets wrong
value ends up the dev_set_mac_address() fails to set the desired MAC
address.

Fixes: 25766271 ("r8152: Refresh MAC address during USBDEVFS_RESET")
Signed-off-by: Crag.Wang <crag.wang@dell.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-By: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6cbcb77

Merge branch 'mlxsw-Shared-buffer-improvements' · 6f97955f

David S. Miller authored Apr 22, 2019

Ido Schimmel says:

====================
mlxsw: Shared buffer improvements

This patchset includes two improvements with regards to shared buffer
configuration in mlxsw.

The first part of this patchset forbids the user from performing illegal
shared buffer configuration that can result in unnecessary packet loss.
In order to better communicate these configuration failures to the user,
extack is propagated from devlink towards drivers. This is done in
patches #1-#8.

The second part of the patchset deals with the shared buffer
configuration of the CPU port. When a packet is trapped by the device,
it is sent across the PCI bus to the attached host CPU. From the
device's perspective, it is as if the packet is transmitted through the
CPU port.

While testing traffic directed at the CPU it became apparent that for
certain packet sizes and certain burst sizes, the current shared buffer
configuration of the CPU port is inadequate and results in packet drops.
The configuration is adjusted by patches #9-#14 that create two new pools
- ingress & egress - which are dedicated for CPU traffic.
====================
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f97955f

mlxsw: spectrum_buffers: Adjust CPU port shared buffer egress quotas · 7a1ff9f4

Ido Schimmel authored Apr 22, 2019

Switch the CPU port to use the new dedicated egress pool instead the
previously used egress pool which was shared with normal front panel
ports.

Add per-port quotas for the amount of traffic that can be buffered for
the CPU port and also adjust the per-{port, TC} quotas.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a1ff9f4

mlxsw: spectrum_buffers: Allow skipping ingress port quota configuration · 6d28725c

Ido Schimmel authored Apr 22, 2019

The CPU port is used to transmit traffic that is trapped to the host
CPU. It is therefore irrelevant to define ingress quota for it.

Add a 'skip_ingress' argument to the function tasked with configuring
per-port quotas, so that ingress quotas could be skipped in case the
passed local port is the CPU port.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d28725c

mlxsw: spectrum_buffers: Split business logic from mlxsw_sp_port_sb_pms_init() · 24a7cc1e

Ido Schimmel authored Apr 22, 2019

The function is used to set the per-port shared buffer quotas.
Currently, these quotas are only set for front panel ports, but a
subsequent patch will configure these quotas for the CPU port as well.

The configuration required for the CPU port is a bit different than that
of the front panel ports, so split the business logic into a separate
function which will be called with different parameters for the CPU
port.

No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24a7cc1e

mlxsw: spectrum_buffers: Use new CPU ingress pool for control packets · 50b5b905

Ido Schimmel authored Apr 22, 2019

Use the new ingress pool that was added in the previous patch for
control packets (e.g., STP, LACP) that are trapped to the CPU.

The previous management pool is no longer necessary and therefore its
size is set to 0.

The maximum quota for traffic towards the CPU is increased to 50% of the
free space in the new ingress pool and therefore the reserved space is
reduced by half, to 10KB - in both the shared and headroom buffer. This
allows for more efficient utilization of the shared buffer as reserved
space cannot be used for other purposes.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50b5b905

mlxsw: spectrum_buffers: Add pools for CPU traffic · 265c49b4

Ido Schimmel authored Apr 22, 2019

Packets that are trapped to the CPU are transmitted through the CPU port
to the attached host. The CPU port is therefore like any other port and
needs to have shared buffer configuration.

The maximum quotas configured for the CPU are provided using dynamic
threshold and cannot be changed by the user. In order to make sure that
these thresholds are always valid, the configuration of the threshold
type of these pools is forbidden.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

265c49b4

mlxsw: spectrum_buffers: Remove assumption about pool order · 857f138f

Ido Schimmel authored Apr 22, 2019

The code currently assumes that ingress pools have lower indices than
egress pools. This makes it impossible to add more ingress pools
without breaking user configuration that relies on a certain pool index
to correspond to an egress pool.

Remove such assumptions from the code, so that more ingress pools could
be added by subsequent patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

857f138f

mlxsw: spectrum_buffers: Forbid changing multicast TCs' attributes · f1aaeacd

Ido Schimmel authored Apr 22, 2019

Commit e83c045e ("mlxsw: spectrum_buffers: Configure MC pool")
configured the threshold of the multicast TCs as infinite so that the
admission of multicast packets is only depended on per-switch priority
threshold.

Forbid the user from changing the thresholds of these multicast TCs and
their binding to a different pool.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1aaeacd

mlxsw: spectrum_buffers: Forbid changing threshold type of first egress pool · 51e15a49

Ido Schimmel authored Apr 22, 2019

Multicast packets have three egress quotas:
* Per egress port
* Per egress port and traffic class
* Per switch priority

The limits on the switch priority are not exposed to the user and
specified as dynamic threshold on the first egress pool.

Forbid changing the threshold type of the first egress pool so that
these limits are always valid.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51e15a49

mlxsw: spectrum_buffers: Forbid configuration of multicast pool · cce7acca

Ido Schimmel authored Apr 22, 2019

Commit e83c045e ("mlxsw: spectrum_buffers: Configure MC pool") added
a dedicated pool for multicast traffic. The pool is visible to the user
so that it would be possible to monitor its occupancy, but its
configuration should be forbidden in order to maintain its intended
operation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cce7acca