Commits · dd27c2e3d0a05c01ff14bb672d1a3f0fdd8f98fc · nexedi / linux

12 Feb, 2019 7 commits

bpf: offload: add priv field for drivers · dd27c2e3

Jakub Kicinski authored Feb 12, 2019

Currently bpf_offload_dev does not have any priv pointer, forcing
the drivers to work backwards from the netdev in program metadata.
This is not great given programs are conceptually associated with
the offload device, and it means one or two unnecessary deferences.
Add a priv pointer to bpf_offload_dev.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

dd27c2e3

tools: bpftool: doc, add text about feature-subcommand · ebbed0f4

Prashant Bhole authored Feb 12, 2019

This patch adds missing information about feature-subcommand in
bpftool.rst
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ebbed0f4

Merge branch 'bpf-prog-build' · ecdf68e2

Alexei Starovoitov authored Feb 11, 2019

Jiong Wang says:

====================
This set improves bpf object file related rules in selftests Makefile.
  - tell git to ignore the build dir "alu32".
  - extend sub-register mode compilation to all bpf object files to give
    LLVM compiler bpf back-end more exercise.
  - auto-generate bpf kernel object file list.
  - relax sub-register mode compilation criteria.

v1 -> v2:
  - rename "kern_progs" to "progs". (Alexei)
  - spin a new patch to remove build server kernel requirement for
    sub-register mode compilation (Alexei)
  - rebase on top of KaFai’s latest "test_sock_fields" patch set.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ecdf68e2

selftests: bpf: relax sub-register mode compilation criteria · 64e39ee2

Jiong Wang authored Feb 11, 2019

Sub-register mode compilation was enabled only when there are eBPF "v3"
processor supports at both compilation time inside LLVM and runtime inside
kernel.

Given separation betwen build and test server could be often, this patch
removes the runtime support criteria.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

64e39ee2

selftests: bpf: centre kernel bpf objects under new subdir "progs" · bd4aed0e

Jiong Wang authored Feb 11, 2019

At the moment, all kernel bpf objects are listed under BPF_OBJ_FILES.
Listing them manually sometimes causing patch conflict when people are
adding new testcases simultaneously.

It is better to centre all the related source files under a subdir
"progs", then auto-generate the object file list.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bd4aed0e

selftests: bpf: extend sub-register mode compilation to all bpf object files · 4836b463

Jiong Wang authored Feb 11, 2019

At the moment, we only do extra sub-register mode compilation on bpf object
files used by "test_progs". These object files are really loaded and
executed.

This patch further extends sub-register mode compilation to all bpf object
files, even those without corresponding runtime tests. Because this could
help testing LLVM sub-register code-gen, kernel bpf selftest has much more
C testcases with reasonable size and complexity compared with LLVM
testsuite which only contains unit tests.

There were some file duplication inside BPF_OBJ_FILES_DUAL_COMPILE which
is removed now.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4836b463

selftests: bpf: add "alu32" to .gitignore · 1727a9dc

Jiong Wang authored Feb 11, 2019

"alu32" is a build dir and contains various files for BPF sub-register
code-gen testing.

This patch tells git to ignore it.
Suggested-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1727a9dc

11 Feb, 2019 9 commits

Merge branch 'skb_sk-sk_fullsock-tcp_sock' · d105fa98

Alexei Starovoitov authored Feb 10, 2019

Martin KaFai Lau says:

====================
This series adds __sk_buff->sk, "struct bpf_tcp_sock",
BPF_FUNC_sk_fullsock and BPF_FUNC_tcp_sock.  Together, they provide
a common way to expose the members of "struct tcp_sock" and
"struct bpf_sock" for the bpf_prog to access.

The patch series first adds a bpf_sock pointer to __sk_buff
and a new helper BPF_FUNC_sk_fullsock.

It then adds BPF_FUNC_tcp_sock to get a bpf_tcp_sock
pointer from a bpf_sock pointer.

The current use case is to allow a cg_skb_bpf_prog to provide
per cgroup traffic policing/shaping.

Please see individual patch for details.

v2:
- Patch 1 depends on
  commit d6238766 ("bpf: Fix narrow load on a bpf_sock returned from sk_lookup()")
  in the bpf branch.
- Add sk_to_full_sk() to bpf_sk_fullsock() and bpf_tcp_sock()
  such that there is a way to access the listener's sk and tcp_sk
  when __sk_buff->sk is a request_sock.
  The comments in the uapi bpf.h is updated accordingly.
- bpf_ctx_range_till() is used in bpf_sock_common_is_valid_access()
  in patch 1.  Saved a few lines.
- Patch 2 is new in v2 and it adds "state", "dst_ip4", "dst_ip6" and
  "dst_port" to the bpf_sock.  Narrow load is allowed on them.
  The "state" (i.e. sk_state) has already been used in
  INET_DIAG (e.g. ss -t) and getsockopt(TCP_INFO).
- While at it in the new patch 2, also allow narrow load on some
  existing fields of the bpf_sock, which are "family", "type", "protocol"
  and "src_port".  Only allow loading from first byte for now.
  i.e. does not allow narrow load starting from the 2nd byte.
- Add some narrow load tests to the test_verifier's sock.c
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d105fa98

bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock · e0b27b3f

Martin KaFai Lau authored Feb 09, 2019

This patch adds a C program to show the usage on
skb->sk and bpf_tcp_sock.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e0b27b3f

bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to test_verifer · fb47d1d9

Martin KaFai Lau authored Feb 09, 2019

This patch tests accessing the skb->sk and the new helpers,
bpf_sk_fullsock and bpf_tcp_sock.

The errstr of some existing "reference tracking" tests is changed
with s/bpf_sock/sock/ and s/socket/sock/ where "sock" is from the
verifier's reg_type_str[].
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fb47d1d9

bpf: Sync bpf.h to tools/ · 281f9e75

Martin KaFai Lau authored Feb 09, 2019

This patch sync the uapi bpf.h to tools/.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

281f9e75

bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock · 655a51e5

Martin KaFai Lau authored Feb 09, 2019

This patch adds a helper function BPF_FUNC_tcp_sock and it
is currently available for cg_skb and sched_(cls|act):

struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_tcp_sock *tp;
	struct bpf_sock *sk;
	__u32 snd_cwnd;

	sk = skb->sk;
	if (!sk)
		return 1;

	tp = bpf_tcp_sock(sk);
	if (!tp)
		return 1;

	snd_cwnd = tp->snd_cwnd;
	/* ... */

	return 1;
}

A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
read-only access.  bpf_tcp_sock has all the existing tcp_sock's fields
that has already been exposed by the bpf_sock_ops.
i.e. no new tcp_sock's fields are exposed in bpf.h.

This helper returns a pointer to the tcp_sock.  If it is not a tcp_sock
or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
returns NULL.  Hence, the caller needs to check for NULL before
accessing it.

The current use case is to expose members from tcp_sock
to allow a cg_skb_bpf_prog to provide per cgroup traffic
policing/shaping.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

655a51e5

bpf: Refactor sock_ops_convert_ctx_access · 9b1f3d6e

Martin KaFai Lau authored Feb 09, 2019

The next patch will introduce a new "struct bpf_tcp_sock" which
exposes the same tcp_sock's fields already exposed in
"struct bpf_sock_ops".

This patch refactor the existing convert_ctx_access() codes for
"struct bpf_sock_ops" to get them ready to be reused for
"struct bpf_tcp_sock".  The "rtt_min" is not refactored
in this patch because its handling is different from other
fields.

The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD
changes are code move only.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9b1f3d6e

bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock · aa65d696

Martin KaFai Lau authored Feb 09, 2019

This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the
bpf_sock.  The userspace has already been using "state",
e.g. inet_diag (ss -t) and getsockopt(TCP_INFO).

This patch also allows narrow load on the following existing fields:
"family", "type", "protocol" and "src_port".  Unlike IP address,
the load offset is resticted to the first byte for them but it
can be relaxed later if there is a use case.

This patch also folds __sock_filter_check_size() into
bpf_sock_is_valid_access() since it is not called
by any where else.  All bpf_sock checking is in
one place.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

aa65d696

bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper · 46f8bc92

Martin KaFai Lau authored Feb 09, 2019

In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock.  For example, in __netdev_pick_tx:

static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
			    struct net_device *sb_dev)
{
	/* ... */

	struct sock *sk = skb->sk;

		if (queue_index != new_index && sk &&
		    sk_fullsock(sk) &&
		    rcu_access_pointer(sk->sk_dst_cache))
			sk_tx_queue_set(sk, new_index);

	/* ... */

	return queue_index;
}

This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().

"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer.  It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
     are not allowed.

The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_sock *sk;

	sk = skb->sk;
	if (!sk)
		return 1;

	sk = bpf_sk_fullsock(sk);
	if (!sk)
		return 1;

	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
		return 1;

	/* some_traffic_shaping(); */

	return 1;
}

(1) The sk is read only

(2) There is no new "struct bpf_sock_common" introduced.

(3) Future kernel sock's members could be added to bpf_sock only
    instead of repeatedly adding at multiple places like currently
    in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.

(4) After "sk = skb->sk", the reg holding sk is in type
    PTR_TO_SOCK_COMMON_OR_NULL.

(5) After bpf_sk_fullsock(), the return type will be in type
    PTR_TO_SOCKET_OR_NULL which is the same as the return type of
    bpf_sk_lookup_xxx().

    However, bpf_sk_fullsock() does not take refcnt.  The
    acquire_reference_state() is only depending on the return type now.
    To avoid it, a new is_acquire_function() is checked before calling
    acquire_reference_state().

(6) The WARN_ON in "release_reference_state()" is no longer an
    internal verifier bug.

    When reg->id is not found in state->refs[], it means the
    bpf_prog does something wrong like
    "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
    never been acquired by calling "bpf_sk_fullsock(skb->sk)".

    A -EINVAL and a verbose are done instead of WARN_ON.  A test is
    added to the test_verifier in a later patch.

    Since the WARN_ON in "release_reference_state()" is no longer
    needed, "__release_reference_state()" is folded into
    "release_reference_state()" also.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

46f8bc92

bpf: Fix narrow load on a bpf_sock returned from sk_lookup() · 5f456649

Martin KaFai Lau authored Feb 08, 2019

By adding this test to test_verifier:
{
	"reference tracking: access sk->src_ip4 (narrow load)",
	.insns = {
	BPF_SK_LOOKUP,
	BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
	BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2),
	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
	BPF_EMIT_CALL(BPF_FUNC_sk_release),
	BPF_EXIT_INSN(),
	},
	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
	.result = ACCEPT,
},

The above test loads 2 bytes from sk->src_ip4 where
sk is obtained by bpf_sk_lookup_tcp().

It hits an internal verifier error from convert_ctx_accesses():
[root@arch-fb-vm1 bpf]# ./test_verifier 665 665
Failed to load prog 'Invalid argument'!
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (7b) *(u64 *)(r10 -16) = r2
3: (7b) *(u64 *)(r10 -24) = r2
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
7: (bf) r2 = r10
8: (07) r2 += -48
9: (b7) r3 = 36
10: (b7) r4 = 0
11: (b7) r5 = 0
12: (85) call bpf_sk_lookup_tcp#84
13: (bf) r6 = r0
14: (15) if r0 == 0x0 goto pc+3
 R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1
15: (69) r2 = *(u16 *)(r0 +26)
16: (bf) r1 = r6
17: (85) call bpf_sk_release#86
18: (95) exit

from 14 to 18: safe
processed 20 insns (limit 131072), stack depth 48
bpf verifier is misconfigured
Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly
loaded (meaning load any 1 or 2 bytes of the src_ip4) by
marking info->ctx_field_size.  However, this marked
ctx_field_size is not used.  This patch fixes it.

Due to the recent refactoring in test_verifier,
this new test will be added to the bpf-next branch
(together with the bpf_tcp_sock patchset)
to avoid merge conflict.

Fixes: c64b7983 ("bpf: Add PTR_TO_SOCKET verifier type")
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

5f456649

08 Feb, 2019 23 commits

Merge branch 'btf-api-extensions' · 28bbfc3a

Alexei Starovoitov authored Feb 08, 2019

Andrii Nakryiko says:

====================
This patchset introduces a set of new APIs that make it possible to work with BTF
more effectively (and without involving kernel) for applications like pahole that
need to manipulate .BTF and .BTF.ext data.

Patch #1 changes existing btf__new() API call to only load and initialize
struct btf, while exposing new btf__load() API to attempt to load and validate
BTF in kernel.

Patch #2 adds btf__get_raw_data() API allowing to get access to raw BTF data from
struct btf.

Patch #3 adds similar btf_ext__get_raw_data() API for working with struct btf_ext.

Patch #4 removes not-yet-stable btf__get_strings() API which was added to be able
to test contents of struct btf for btf__dedup(). It's now superseded by raw APIs.

v3->v4:
- formatting fixes
- renamed btf_ext functions/structs to use "setup" language instead of "copy"
- removed btf__get_strings from libbpf.map

v2->v3:
- const void* variants of btf__get_raw_data()
- added btf_ext__get_raw_data()
- removed btf__get_strings() and adapted test_btf.c to use btf__get_raw_data()

v1->v2:
- btf_load() returns just error, not fd
- fix ordering in libbpf.map
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

28bbfc3a

tools/bpf: remove btf__get_strings() superseded by raw data API · 49b57e0d

Andrii Nakryiko authored Feb 08, 2019

Now that we have btf__get_raw_data() it's trivial for tests to iterate
over all strings for testing purposes, which eliminates the need for
btf__get_strings() API.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

49b57e0d

btf: expose API to work with raw btf_ext data · ae4ab4b4

Andrii Nakryiko authored Feb 08, 2019

This patch changes struct btf_ext to retain original data in sequential
block of memory, which makes it possible to expose
btf_ext__get_raw_data() interface similar to btf__get_raw_data(), allowing
users of libbpf to get access to raw representation of .BTF.ext section.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ae4ab4b4

btf: expose API to work with raw btf data · 02c87446

Andrii Nakryiko authored Feb 08, 2019

This patch exposes new API btf__get_raw_data() that allows to get a copy
of raw BTF data out of struct btf. This is useful for external programs
that need to manipulate raw data, e.g., pahole using btf__dedup() to
deduplicate BTF type info and then writing it back to file.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

02c87446

btf: separate btf creation and loading · d29d87f7

Andrii Nakryiko authored Feb 08, 2019

This change splits out previous btf__new functionality of constructing
struct btf and loading it into kernel into two:
- btf__new() just creates and initializes struct btf
- btf__load() attempts to load existing struct btf into kernel

btf__free will still close BTF fd, if it was ever loaded successfully
into kernel.

This change allows users of libbpf to manipulate BTF using its API,
without the need to unnecessarily load it into kernel.

One of the intended use cases is pahole, which will do DWARF to BTF
conversion and then use libbpf to do type deduplication, while then
handling ELF sections overwriting and other concerns on its own.

Fixes: 2d3feca8 ("bpf: btf: print map dump and lookup with btf info")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d29d87f7

tools/bpf: add log_level to bpf_load_program_attr · a4021a35

Yonghong Song authored Feb 07, 2019

The kernel verifier has three levels of logs:
    0: no logs
    1: logs mostly useful
  > 1: verbose

Current libbpf API functions bpf_load_program_xattr() and
bpf_load_program() cannot specify log_level.
The bcc, however, provides an interface for user to
specify log_level 2 for verbose output.

This patch added log_level into structure
bpf_load_program_attr, so users, including bcc, can use
bpf_load_program_xattr() to change log_level. The
supported log_level is 0, 1, and 2.

The bpf selftest test_sock.c is modified to enable log_level = 2.
If the "verbose" in test_sock.c is changed to true,
the test will output logs like below:
  $ ./test_sock
  func#0 @0
  0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  0: (bf) r6 = r1
  1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  1: (61) r7 = *(u32 *)(r6 +28)
  invalid bpf_context access off=28 size=4

  Test case: bind4 load with invalid access: src_ip6 .. [PASS]
  ...
  Test case: bind6 allow all .. [PASS]
  Summary: 16 PASSED, 0 FAILED

Some test_sock tests are negative tests and verbose verifier
log will be printed out as shown in the above.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a4021a35

tools/bpf: add missing strings.h include · 62b8cea6

Andrii Nakryiko authored Feb 07, 2019

Few files in libbpf are using bzero() function (defined in strings.h header), but
don't include corresponding header. When libbpf is added as a dependency to pahole,
this undeterministically causes warnings on some machines:

bpf.c:225:2: warning: implicit declaration of function 'bzero' [-Wimplicit-function-declaration]
  bzero(&attr, sizeof(attr));
    ^~~~~
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

62b8cea6

net: fixed-phy: Add fixed_phy_register_with_gpiod() API · 71bd106d

Moritz Fischer authored Feb 07, 2019

Add fixed_phy_register_with_gpiod() API. It lets users create a
fixed_phy instance that uses a GPIO descriptor which was obtained
externally e.g. through platform data.
This enables platform devices (non-DT based) to use GPIOs for link
status.
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71bd106d

Merge branch 'Add-comphy-support-for-Armada-38x' · a4751093

David S. Miller authored Feb 07, 2019

Russell King says:

====================
Add comphy support for Armada 38x

This series adds support for the comphy for Armada 38x, which allows
these SoCs to use 2500BASE-X mode with appropriate SFP modules.

Tested on SolidRun Clearfog after updating for the 5.0 merge window
changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a4751093

ARM: dts: clearfog: add comphy settings for Ethernet interfaces · f548ced1

Russell King authored Feb 07, 2019

Add the comphy settings for the Ethernet interfaces.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

f548ced1

net: marvell: neta: add comphy support · a10c1c81

Russell King authored Feb 07, 2019

Add support for the common phy binding, so that we can reconfigure the
comphy according to the desired ethernet speed. This will allow us to
support 1000base-X and 2500base-X SFPs dynamically on SolidRun Clearfog.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

a10c1c81

dt-bindings: net: mvneta: add phys property · 4ca124f4

Russell King authored Feb 07, 2019

Add an optional phys property to the mvneta binding documentation for
the common phy.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ca124f4

ARM: dts: add description for Armada 38x common phy · f3a6a9f3

Russell King authored Feb 07, 2019

Add the DT description for the Armada 38x common phy.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3a6a9f3

phy: armada38x: add common phy support · 14dc100b

Russell King authored Feb 07, 2019

Add support for the Armada 38x common phy to allow us to change the
speed of the Ethernet serdes lane.  This driver only supports
manipulation of the speed, it does not support configuration of the
common phy.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

14dc100b

dt-bindings: phy: Armada 38x common phy bindings · 12038271

Russell King authored Feb 07, 2019

Add the Marvell Armada 38x common phy bindings.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

12038271

Merge branch 'smc-next' · f06f095f

David S. Miller authored Feb 07, 2019

Ursula Braun says:

====================
net/smc: patches 2019-02-07

here are patches for SMC:
* patches 1, 3, and 6 are cleanups without functional change
* patch 2 postpones closing of internal clcsock
* patches 4 and 5 improve link group creation locking
* patch 7 restores AF_SMC as diag_family field
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f06f095f

net/smc: original socket family in inet_sock_diag · 232dc8ef

Karsten Graul authored Feb 07, 2019

Commit ed75986f ("net/smc: ipv6 support for smc_diag.c") changed the
value of the diag_family field. The idea was to indicate the family of
the IP address in the inet_diag_sockid field. But the change makes it
impossible to distinguish an inet_sock_diag response message from SMC
sock_diag response. This patch restores the original behaviour and sends
AF_SMC as value of the diag_family field.

Fixes: ed75986f ("net/smc: ipv6 support for smc_diag.c")
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

232dc8ef

net/smc: move code to clear the conn->lgr field · 8fc002b0

Karsten Graul authored Feb 07, 2019

The lgr field of an smc_connection is set in smc_conn_create() and
should be cleared in smc_conn_free() for consistency reasons, so move
the responsible code.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fc002b0

net/smc: use client and server LGR pending locks for SMC-R · 72a36a8a

Hans Wippel authored Feb 07, 2019

If SMC client and server connections are both established at the same
time, smc_connect_rdma() cannot send a CLC confirm message while
smc_listen_work() is waiting for one due to lock contention. This can
result in timeouts in smc_clc_wait_msg() and failed SMC connections.

In case of SMC-R, there are two types of LGRs (client and server LGRs)
which can be protected by separate locks. So, this patch splits the LGR
pending lock into two separate locks for client and server to avoid the
locking issue for SMC-R.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72a36a8a

net/smc: unlock LGR pending lock earlier for SMC-D · 62c7139f

Hans Wippel authored Feb 07, 2019

If SMC client and server connections are both established at the same
time, smc_connect_ism() cannot send a CLC confirm message while
smc_listen_work() is waiting for one due to lock contention. This can
result in timeouts in smc_clc_wait_msg() and failed SMC connections.

In case of SMC-D, the LGR pending lock is not needed while
smc_listen_work() is waiting for the CLC confirm message. So, this patch
releases the lock earlier for SMC-D to avoid the locking issue.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62c7139f

net/smc: use smc_curs_copy() for SMC-D · a225d2cd

Ursula Braun authored Feb 07, 2019

SMC already provides a wrapper for atomic64 calls to be
architecture independent. Use this wrapper for SMC-D as well.
Reported-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a225d2cd

net/smc: postpone release of clcsock · b03faa1f

Ursula Braun authored Feb 07, 2019

According to RFC7609 (http://www.rfc-editor.org/info/rfc7609)
first the SMC-R connection is shut down and then the normal TCP
connection FIN processing drives cleanup of the internal TCP connection.
The unconditional release of the clcsock during active socket closing
has to be postponed if the peer has not yet signalled socket closing.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b03faa1f

s390/net: move pnet constants · 41c80be2

Ursula Braun authored Feb 07, 2019

There is no need to define these PNETID related constants in
the pnet.h file, since they are just used locally within pnet.c.
Just code cleanup, no functional change.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41c80be2

07 Feb, 2019 1 commit

net: vxlan: Free a leaked vetoed multicast rdst · fc4aa1ca

Petr Machata authored Feb 07, 2019

When an rdst is rejected by a driver, the current code removes it from
the remote list, but neglects to free it. This is triggered by
tools/testing/selftests/drivers/net/mlxsw/vxlan_fdb_veto.sh and shows as
the following kmemleak trace:

unreferenced object 0xffff88817fa3d888 (size 96):
  comm "softirq", pid 0, jiffies 4372702718 (age 165.252s)
  hex dump (first 32 bytes):
    02 00 00 00 c6 33 64 03 80 f5 a2 61 81 88 ff ff  .....3d....a....
    06 df 71 ae ff ff ff ff 0c 00 00 00 04 d2 6a 6b  ..q...........jk
  backtrace:
    [<00000000296b27ac>] kmem_cache_alloc_trace+0x1ae/0x370
    [<0000000075c86dc6>] vxlan_fdb_append.part.12+0x62/0x3b0 [vxlan]
    [<00000000e0414b63>] vxlan_fdb_update+0xc61/0x1020 [vxlan]
    [<00000000f330c4bd>] vxlan_fdb_add+0x2e8/0x3d0 [vxlan]
    [<0000000008f81c2c>] rtnl_fdb_add+0x4c2/0xa10
    [<00000000bdc4b270>] rtnetlink_rcv_msg+0x6dd/0x970
    [<000000006701f2ce>] netlink_rcv_skb+0x290/0x410
    [<00000000c08a5487>] rtnetlink_rcv+0x15/0x20
    [<00000000d5f54b1e>] netlink_unicast+0x43f/0x5e0
    [<00000000db4336bb>] netlink_sendmsg+0x789/0xcd0
    [<00000000e1ee26b6>] sock_sendmsg+0xba/0x100
    [<00000000ba409802>] ___sys_sendmsg+0x631/0x960
    [<000000003c332113>] __sys_sendmsg+0xea/0x180
    [<00000000f4139144>] __x64_sys_sendmsg+0x78/0xb0
    [<000000006d1ddc59>] do_syscall_64+0x94/0x410
    [<00000000c8defa9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Move vxlan_dst_free() up and schedule a call thereof to plug this leak.

Fixes: 61f46fe8 ("vxlan: Allow vetoing of FDB notifications")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc4aa1ca