- 22 May, 2020 1 commit
-
-
Björn Töpel authored
Calculating the "data_hard_end" for an XDP buffer coming from AF_XDP zero-copy mode, the return value of xsk_umem_xdp_frame_sz() is added to "data_hard_start". Currently, the chunk size of the UMEM is returned by xsk_umem_xdp_frame_sz(). This is not correct, if the fixed UMEM headroom is non-zero. Fix this by returning the chunk_size without the UMEM headroom. Fixes: 2a637c5b ("xdp: For Intel AF_XDP drivers add XDP frame_sz") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-2-bjorn.topel@gmail.com
-
- 19 May, 2020 14 commits
-
-
Andrii Nakryiko authored
b9f4c01f ("selftest/bpf: Make bpf_iter selftest compilable against old vmlinux.h") missed the fact that bpf_iter_test_kern{3,4}.c are not just including bpf_iter_test_kern_common.h and need similar bpf_iter_meta re-definition explicitly. Fixes: b9f4c01f ("selftest/bpf: Make bpf_iter selftest compilable against old vmlinux.h") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200519192341.134360-1-andriin@fb.com
-
Andrii Nakryiko authored
It's good to be able to compile bpf_iter selftest even on systems that don't have the very latest vmlinux.h, e.g., for libbpf tests against older kernels in Travis CI. To that extent, re-define bpf_iter_meta and corresponding bpf_iter context structs in each selftest. To avoid type clashes with vmlinux.h, rename vmlinux.h's definitions to get them out of the way. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20200518234516.3915052-1-andriin@fb.com
-
Alexei Starovoitov authored
Sync tools/include/uapi/linux/bpf.h from include/uapi. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Daniel Borkmann says: ==================== Trivial patch to add get{peer,sock}name cgroup attach types to the BPF sock_addr programs in order to enable rewriting sockaddr structs from both calls along with libbpf and bpftool support as well as selftests. Thanks! v1 -> v2: - use __u16 for ports in start_server_with_port() signature and in expected_{local,peer} ports in the test case (Andrey) - Added both Andrii's and Andrey's ACKs ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Extend the existing connect_force_port test to assert get{peer,sock}name programs as well. The workflow for e.g. IPv4 is as follows: i) server binds to concrete port, ii) client calls getsockname() on server fd which exposes 1.2.3.4:60000 to client, iii) client connects to service address 1.2.3.4:60000 binds to concrete local address (127.0.0.1:22222) and remaps service address to a concrete backend address (127.0.0.1:60123), iv) client then calls getsockname() on its own fd to verify local address (127.0.0.1:22222) and getpeername() on its own fd which then publishes service address (1.2.3.4:60000) instead of actual backend. Same workflow is done for IPv6 just with different address/port tuples. # ./test_progs -t connect_force_port #14 connect_force_port:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/3343da6ad08df81af715a95d61a84fb4a960f2bf.1589841594.git.daniel@iogearbox.net
-
Daniel Borkmann authored
Make bpftool aware and add the new get{peer,sock}name attach types to its cli, documentation and bash completion to allow attachment/detachment of sock_addr programs there. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/9765b3d03e4c29210c4df56a9cc7e52f5f7bb5ef.1589841594.git.daniel@iogearbox.net
-
Daniel Borkmann authored
Trivial patch to add the new get{peer,sock}name attach types to the section definitions in order to hook them up to sock_addr cgroup program type. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/7fcd4b1e41a8ebb364754a5975c75a7795051bd2.1589841594.git.daniel@iogearbox.net
-
Daniel Borkmann authored
As stated in 983695fa ("bpf: fix unconnected udp hooks"), the objective for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be transparent to applications. In Cilium we make use of these hooks [0] in order to enable E-W load balancing for existing Kubernetes service types for all Cilium managed nodes in the cluster. Those backends can be local or remote. The main advantage of this approach is that it operates as close as possible to the socket, and therefore allows to avoid packet-based NAT given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses. This also allows to expose NodePort services on loopback addresses in the host namespace, for example. As another advantage, this also efficiently blocks bind requests for applications in the host namespace for exposed ports. However, one missing item is that we also need to perform reverse xlation for inet{,6}_getname() hooks such that we can return the service IP/port tuple back to the application instead of the remote peer address. The vast majority of applications does not bother about getpeername(), but in a few occasions we've seen breakage when validating the peer's address since it returns unexpectedly the backend tuple instead of the service one. Therefore, this trivial patch allows to customise and adds a getpeername() as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order to address this situation. Simple example: # ./cilium/cilium service list ID Frontend Service Type Backend 1 1.2.3.4:80 ClusterIP 1 => 10.0.0.10:80 Before; curl's verbose output example, no getpeername() reverse xlation: # curl --verbose 1.2.3.4 * Rebuilt URL to: 1.2.3.4/ * Trying 1.2.3.4... * TCP_NODELAY set * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0) > GET / HTTP/1.1 > Host: 1.2.3.4 > User-Agent: curl/7.58.0 > Accept: */* [...] After; with getpeername() reverse xlation: # curl --verbose 1.2.3.4 * Rebuilt URL to: 1.2.3.4/ * Trying 1.2.3.4... * TCP_NODELAY set * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0) > GET / HTTP/1.1 > Host: 1.2.3.4 > User-Agent: curl/7.58.0 > Accept: */* [...] Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed peer to the context similar as in inet{,6}_getname() fashion, but API-wise this is suboptimal as it always enforces programs having to test for ctx->peer which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split. Similarly, the checked return code is on tnum_range(1, 1), but if a use case comes up in future, it can easily be changed to return an error code instead. Helper and ctx member access is the same as with connect/sendmsg/etc hooks. [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.cSigned-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
-
Jesper Dangaard Brouer authored
Commit bc56c919 ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().") recently changed bpf_prog_test_run_xdp() to use larger frames for XDP in order to test tail growing frames (via bpf_xdp_adjust_tail) and to have memory backing frame better resemble drivers. The commit contains a bug, as it tries to copy the max data size from userspace, instead of the size provided by userspace. This cause XDP unit tests to fail sporadically with EFAULT, an unfortunate behavior. The fix is to only copy the size specified by userspace. Fixes: bc56c919 ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/158980712729.256597.6115007718472928659.stgit@firesoul
-
Daniel T. Lee authored
Because the previous two commit replaced the bpf_load implementation of the user program with libbpf, the corresponding kernel program's MAP definition can be replaced with new BTF-defined map syntax. This commit only updates the samples which uses libbpf API for loading bpf program not with bpf_load. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200516040608.1377876-6-danieltimlee@gmail.com
-
Daniel T. Lee authored
This commit adds tracex7 test file (testfile.img) to .gitignore which comes from test_override_return.sh. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200516040608.1377876-5-danieltimlee@gmail.com
-
Daniel T. Lee authored
BPF tail call uses the BPF_MAP_TYPE_PROG_ARRAY type map for calling into other BPF programs and this PROG_ARRAY should be filled prior to use. Currently, samples with the PROG_ARRAY type MAP fill this program array with bpf_load. For bpf_load to fill this map, kernel BPF program must specify the section with specific format of <prog_type>/<array_idx> (e.g. SEC("socket/0")) But by using libbpf instead of bpf_load, user program can specify which programs should be added to PROG_ARRAY. The advantage of this approach is that you can selectively add only the programs you want, rather than adding all of them to PROG_ARRAY, and it's much more intuitive than the traditional approach. This commit refactors user programs with the PROG_ARRAY type MAP with libbpf instead of using bpf_load. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200516040608.1377876-4-danieltimlee@gmail.com
-
Daniel T. Lee authored
Currently, the kprobe BPF program attachment method for bpf_load is quite old. The implementation of bpf_load "directly" controls and manages(create, delete) the kprobe events of DEBUGFS. On the other hand, using using the libbpf automatically manages the kprobe event. (under bpf_link interface) By calling bpf_program__attach(_kprobe) in libbpf, the corresponding kprobe is created and the BPF program will be attached to this kprobe. To remove this, by simply invoking bpf_link__destroy will clean up the event. This commit refactors kprobe tracing programs (tracex{1~7}_user.c) with libbpf using bpf_link interface and bpf_program__attach. tracex2_kern.c, which tracks system calls (sys_*), has been modified to append prefix depending on architecture. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200516040608.1377876-3-danieltimlee@gmail.com
-
Daniel T. Lee authored
Current method of checking pointer error is not user friendly. Especially the __must_check define makes this less intuitive. Since, libbpf has an API libbpf_get_error() which checks pointer error, this commit refactors existing pointer error check logic with libbpf. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200516040608.1377876-2-danieltimlee@gmail.com
-
- 16 May, 2020 10 commits
-
-
John Fastabend authored
Until now we have only had minimal ktls+sockmap testing when being used with helpers and different sendmsg/sendpage patterns. Add a pass with ktls here. To run just ktls tests, $ ./test_sockmap --whitelist="ktls" Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939736278.15176.5435314315563203761.stgit@john-Precision-5820-Tower
-
John Fastabend authored
This adds a blacklist to test_sockmap. For example, now we can run all apply and cork tests except those with timeouts by doing, $ ./test_sockmap --whitelist "apply,cork" --blacklist "hang" Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939734350.15176.6643981099665208826.stgit@john-Precision-5820-Tower
-
John Fastabend authored
Allow running specific tests with a comma deliminated whitelist. For example to run all apply and cork tests. $ ./test_sockmap --whitelist="cork,apply" Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939732464.15176.1959113294944564542.stgit@john-Precision-5820-Tower
-
John Fastabend authored
Pass options from command line args into individual tests which allows us to use verbose option from command line with selftests. Now when verbose option is set individual subtest details will be printed. Also we can consolidate cgroup bring up and tear down. Additionally just setting verbose is very noisy so introduce verbose=1 and verbose=2. Really verbose=2 is only useful when developing tests or debugging some specific issue. For example now we get output like this with --verbose, #20/17 sockhash:txmsg test pull-data:OK [TEST 160]: (512, 1, 3, sendpage, pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err 0 [TEST 161]: (100, 1, 5, sendpage, pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 3 cnt 100 err 0 [TEST 162]: (2, 1024, 256, sendpage, pop (4096,8192),): msg_loop_rx: iov_count 1 iov_buf 255 cnt 2 err 0 [TEST 163]: (512, 1, 3, sendpage, redir,pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err 0 [TEST 164]: (100, 1, 5, sendpage, redir,pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 3 cnt 100 err 0 [TEST 165]: (512, 1, 3, sendpage, cork 512,pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err 0 [TEST 166]: (100, 1, 5, sendpage, cork 512,pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 3 cnt 100 err 0 [TEST 167]: (512, 1, 3, sendpage, redir,cork 4,pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err 0 [TEST 168]: (100, 1, 5, sendpage, redir,cork 4,pop (1,3),): msg_loop_rx: iov_count 1 iov_buf 3 cnt 100 err 0 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939730412.15176.1975675235035143367.stgit@john-Precision-5820-Tower
-
John Fastabend authored
At the moment test_sockmap runs all 800+ tests ungrouped which is not ideal because it makes it hard to see what is failing but also more importantly its hard to confirm all cases are tested. Additionally, after inspecting we noticed the runtime is bloated because we run many duplicate tests. Worse some of these tests are known error cases that wait for the recvmsg handler to timeout which creats long delays. Also we noted some tests were not clearing their options and as a result the following tests would run with extra and incorrect options. Fix this by reorganizing test code so its clear what tests are running and when. Then it becomes easy to remove duplication and run tests with only the set of send/recv patterns that are relavent. To accomplish this break test_sockmap into subtests and remove unnecessary duplication. The output is more readable now and the runtime reduced. Now default output prints subtests like this, $ ./test_sockmap # 1/ 6 sockmap:txmsg test passthrough:OK ... #22/ 1 sockhash:txmsg test push/pop data:OK Pass: 22 Fail: 0 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939728384.15176.13601520183665880762.stgit@john-Precision-5820-Tower
-
John Fastabend authored
The recv thread in test_sockmap waits to receive all bytes from sender but in the case we use pop data it may wait for more bytes then actually being sent. This stalls the test harness for multiple seconds. Because this happens in multiple tests it slows time to run the selftest. Fix by doing a better job of accounting for total bytes when pop helpers are used. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939726542.15176.5964532245173539540.stgit@john-Precision-5820-Tower
-
John Fastabend authored
Its helpful to know the error value if an error occurs. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939724566.15176.12079885932643225626.stgit@john-Precision-5820-Tower
-
John Fastabend authored
Running test_sockmap with arguments to specify a test pattern requires including a cgroup argument. Instead of requiring this if the option is not provided create one This is not used by selftest runs but I use it when I want to test a specific test. Most useful when developing new code and/or tests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939722675.15176.6294210959489131688.stgit@john-Precision-5820-Tower
-
John Fastabend authored
The prints in the test_sockmap programs were only useful when we didn't have enough control over test infrastructure to know from user program what was being pushed into kernel side. Now that we have or will shortly have better test controls lets remove the printers. This means we can remove half the programs and cleanup bpf side. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939720756.15176.9806965887313279429.stgit@john-Precision-5820-Tower
-
John Fastabend authored
Moves test_sockmap_kern.h into progs directory but does not change code at all. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/158939718921.15176.5766299102332077086.stgit@john-Precision-5820-Tower
-
- 15 May, 2020 15 commits
-
-
Stanislav Fomichev authored
There is a much higher chance we can see the regressions if the test is part of test_progs. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200515194904.229296-2-sdf@google.com
-
Stanislav Fomichev authored
Commit 294f2fc6 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()") changed the way verifier logs some of its state, adjust the test_align accordingly. Where possible, I tried to not copy-paste the entire log line and resorted to dropping the last closing brace instead. Fixes: 294f2fc6 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200515194904.229296-1-sdf@google.com
-
Ian Rogers authored
Fixes the following warnings: hashmap.c: In function ‘hashmap__clear’: hashmap.h:150:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare] 150 | for (bkt = 0; bkt < map->cap; bkt++) \ hashmap.c: In function ‘hashmap_grow’: hashmap.h:150:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare] 150 | for (bkt = 0; bkt < map->cap; bkt++) \ Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200515165007.217120-4-irogers@google.com
-
Ian Rogers authored
Remove #include of libbpf_internal.h that is unused. Discussed in this thread: https://lore.kernel.org/lkml/CAEf4BzZRmiEds_8R8g4vaAeWvJzPb4xYLnpF0X2VNY8oTzkphQ@mail.gmail.com/Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200515165007.217120-3-irogers@google.com
-
Daniel Borkmann authored
As per 15d83c4d ("bpf: Allow loading of a bpf_iter program") we only allow a range of [0,1] for return codes. Therefore BPF_TRACE_ITER relies on the default tnum_range(0, 1) which is set in range var. On recent merge of net into net-next commit e92888c7 ("bpf: Enforce returning 0 for fentry/fexit progs") got pulled in and caused a merge conflict with the changes from 15d83c4d. The resolution had a snall hiccup in that it removed the [0,1] range restriction again so that BPF_TRACE_ITER would have no enforcement. Fix it by adding it back. Fixes: da07f52d ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller authored
Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Fix sk_psock reference count leak on receive, from Xiyu Yang. 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven. 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this, from Maciej Żenczykowski. 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo Abeni. 5) Fix netprio cgroup v2 leak, from Zefan Li. 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal. 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet. 8) Various bpf probe handling fixes, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) selftests: mptcp: pm: rm the right tmp file dpaa2-eth: properly handle buffer size restrictions bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range bpf: Restrict bpf_probe_read{, str}() only to archs where they work MAINTAINERS: Mark networking drivers as Maintained. ipmr: Add lockdep expression to ipmr_for_each_table macro ipmr: Fix RCU list debugging warning drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 tcp: fix error recovery in tcp_zerocopy_receive() MAINTAINERS: Add Jakub to networking drivers. MAINTAINERS: another add of Karsten Graul for S390 networking drivers: ipa: fix typos for ipa_smp2p structure doc pppoe: only process PADT targeted at local interfaces selftests/bpf: Enforce returning 0 for fentry/fexit programs bpf: Enforce returning 0 for fentry/fexit progs net: stmmac: fix num_por initialization security: Fix the default value of secid_to_secctx hook libbpf: Fix register naming in PT_REGS s390 macros ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds authored
Pull rdma fixes from Jason Gunthorpe: "A few minor bug fixes for user visible defects, and one regression: - Various bugs from static checkers and syzkaller - Add missing error checking in mlx4 - Prevent RTNL lock recursion in i40iw - Fix segfault in cxgb4 in peer abort cases - Fix a regression added in 5.7 where the IB_EVENT_DEVICE_FATAL could be lost, and wasn't delivered to all the FDs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Move IB_EVENT_DEVICE_FATAL to destroy_uobj RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event RDMA/iw_cxgb4: Fix incorrect function parameters RDMA/core: Fix double put of resource IB/core: Fix potential NULL pointer dereference in pkey cache IB/hfi1: Fix another case where pq is left on waitlist IB/i40iw: Remove bogus call to netdev_master_upper_dev_get() IB/mlx4: Test return value of calls to ib_get_cached_pkey RDMA/rxe: Always return ERR_PTR from rxe_create_mmap_info() i40iw: Fix error handling in i40iw_manage_arp_cache()
-
Linus Torvalds authored
Merge tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - lkdtm runner fixes to prevent dmesg clearing and shellcheck errors - ftrace test handling when test module doesn't exist - nsfs test fix to replace zero-length array with flexible-array - dmabuf-heaps test fix to return clear error value * tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/lkdtm: Use grep -E instead of egrep selftests/lkdtm: Don't clear dmesg when running tests selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist tools/testing: Replace zero-length array with flexible-array kselftests: dmabuf-heaps: Fix confused return value on expected error testing
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: "A handful of build fixes, all found by Huawei's autobuilder. None of these patches should have any functional impact on kernels that build, and they're mostly related to various features intermingling with !MMU. While some of these might be better hoisted to generic code, it seems better to have the simple fixes in the meanwhile. As far as I know these are the only outstanding patches for 5.7" * tag 'riscv-for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: mmiowb: Fix implicit declaration of function 'smp_processor_id' riscv: pgtable: Fix __kernel_map_pages build error if NOMMU riscv: Make SYS_SUPPORTS_HUGETLBFS depends on MMU riscv: Disable ARCH_HAS_DEBUG_VIRTUAL if NOMMU riscv: Add pgprot_writecombine/device and PAGE_SHARED defination if NOMMU riscv: stacktrace: Fix undefined reference to `walk_stackframe' riscv: Fix unmet direct dependencies built based on SOC_VIRT riscv: perf: RISCV_BASE_PMU should be independent riscv: perf_event: Make some funciton static
-
David S. Miller authored
Paolo Abeni says: ==================== mptcp: fix MP_JOIN failure handling Currently if we hit an MP_JOIN failure on the third ack, the child socket is closed with reset, but the request socket is not deleted, causing weird behaviors. The main problem is that MPTCP's MP_JOIN code needs to plug it's own 'valid 3rd ack' checks and the current TCP callbacks do not allow that. This series tries to address the above shortcoming introducing a new MPTCP specific bit in a 'struct tcp_request_sock' hole, and leveraging that to allow tcp_check_req releasing the request socket when needed. The above allows cleaning-up a bit current MPTCP hooking in tcp_check_req(). An alternative solution, possibly cleaner but more invasive, would be changing the 'bool *own_req' syn_recv_sock() argument into 'int *req_status' and let MPTCP set it to 'REQ_DROP'. v1 -> v2: - be more conservative about drop_req initialization RFC -> v1: - move the drop_req bit inside tcp_request_sock (Eric) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Currently, on MP_JOIN failure we reset the child socket, but leave the request socket untouched. tcp_check_req will deal with it according to the 'tcp_abort_on_overflow' sysctl value - by default the req socket will stay alive. The above leads to inconsistent behavior on MP JOIN failure, and bad listener overflow accounting. This patch addresses the issue leveraging the infrastructure just introduced to ask the TCP stack to drop the req on failure. The child socket is not freed anymore by subflow_syn_recv_sock(), instead it's moved to a dead state and will be disposed by the next sock_put done by the TCP stack, so that listener overflow accounting is not affected by MP JOIN failure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Move the steps to prepare an inet_connection_sock for forced disposal inside a separate helper. No functional changes inteded, this will just simplify the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
MP_JOIN subflows must not land into the accept queue. Currently tcp_check_req() calls an mptcp specific helper to detect such scenario. Such helper leverages the subflow context to check for MP_JOIN subflows. We need to deal also with MP JOIN failures, even when the subflow context is not available due allocation failure. A possible solution would be changing the syn_recv_sock() signature to allow returning a more descriptive action/ error code and deal with that in tcp_check_req(). Since the above need is MPTCP specific, this patch instead uses a TCP request socket hole to add a MPTCP specific flag. Such flag is used by the MPTCP syn_recv_sock() to tell tcp_check_req() how to deal with the request socket. This change is a no-op for !MPTCP build, and makes the MPTCP code simpler. It allows also the next patch to deal correctly with MP JOIN failure. v1 -> v2: - be more conservative on drop_req initialization (Mat) RFC -> v1: - move the drop_req bit inside tcp_request_sock (Eric) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fix from Catalin Marinas: "Fix flush_icache_range() second argument in machine_kexec() to be an address rather than size" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: fix the flush_icache_range arguments in machine_kexec
-