- 30 Sep, 2023 1 commit
-
-
Hou Tao authored
Commit d52b5931 ("bpf: Adjust size_index according to the value of KMALLOC_MIN_SIZE") uses KMALLOC_MIN_SIZE to adjust size_index, but as reported by Nathan, the adjustment is not enough, because __kmalloc_minalign() also decides the minimal alignment of slab object as shown in new_kmalloc_cache() and its value may be greater than KMALLOC_MIN_SIZE (e.g., 64 bytes vs 8 bytes under a riscv QEMU VM). Instead of invoking __kmalloc_minalign() in bpf subsystem to find the maximal alignment, just using kmalloc_size_roundup() directly to get the corresponding slab object size for each allocation size. If these two sizes are unmatched, adjust size_index to select a bpf_mem_cache with unit_size equal to the object_size of the underlying slab cache for the allocation size. Fixes: 822fb26b ("bpf: Add a hint to allocated objects.") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/bpf/20230914181407.GA1000274@dev-arch.thelio-3990X/Signed-off-by: Hou Tao <houtao1@huawei.com> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Link: https://lore.kernel.org/r/20230928101558.2594068-1-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 29 Sep, 2023 6 commits
-
-
Daniel Borkmann authored
Add various tests to check maximum number of supported programs being attached: # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.185325] bpf_testmod: loading out-of-tree module taints kernel. [ 1.186826] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.270123] tsc: Refined TSC clocksource calibration: 3407.988 MHz [ 1.272428] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns [ 1.276408] clocksource: Switched to clocksource tsc #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK <--- (new test) #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_replace:OK #269 tc_opts_revision:OK Summary: 18/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230929204121.20305-2-daniel@iogearbox.net
-
Daniel Borkmann authored
After Paul's recent improvement to syzkaller to improve coverage for bpf_mprog and tcx, it hit a splat that the program limit was surpassed. What happened is that the maximum number of progs got added, followed by another prog add request which adds with BPF_F_BEFORE flag relative to the last program in the array. The idx >= bpf_mprog_max() check in bpf_mprog_attach() still passes because the index is below the maximum but the maximum will be surpassed. We need to add a check upfront for insertions to catch this situation. Fixes: 053c8e1f ("bpf: Add generic attach/detach/query API for multi-progs") Reported-by: syzbot+baa44e3dbbe48e05c1ad@syzkaller.appspotmail.com Reported-by: syzbot+b97d20ed568ce0951a06@syzkaller.appspotmail.com Reported-by: syzbot+2558ca3567a77b7af4e3@syzkaller.appspotmail.com Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: syzbot+baa44e3dbbe48e05c1ad@syzkaller.appspotmail.com Tested-by: syzbot+b97d20ed568ce0951a06@syzkaller.appspotmail.com Link: https://github.com/google/syzkaller/pull/4207 Link: https://lore.kernel.org/bpf/20230929204121.20305-1-daniel@iogearbox.net
-
Jakub Sitnicki authored
With a SOCKMAP/SOCKHASH map and an sk_msg program user can steer messages sent from one TCP socket (s1) to actually egress from another TCP socket (s2): tcp_bpf_sendmsg(s1) // = sk_prot->sendmsg tcp_bpf_send_verdict(s1) // __SK_REDIRECT case tcp_bpf_sendmsg_redir(s2) tcp_bpf_push_locked(s2) tcp_bpf_push(s2) tcp_rate_check_app_limited(s2) // expects tcp_sock tcp_sendmsg_locked(s2) // ditto There is a hard-coded assumption in the call-chain, that the egress socket (s2) is a TCP socket. However in commit 122e6c79 ("sock_map: Update sock type checks for UDP") we have enabled redirects to non-TCP sockets. This was done for the sake of BPF sk_skb programs. There was no indention to support sk_msg send-to-egress use case. As a result, attempts to send-to-egress through a non-TCP socket lead to a crash due to invalid downcast from sock to tcp_sock: BUG: kernel NULL pointer dereference, address: 000000000000002f ... Call Trace: <TASK> ? show_regs+0x60/0x70 ? __die+0x1f/0x70 ? page_fault_oops+0x80/0x160 ? do_user_addr_fault+0x2d7/0x800 ? rcu_is_watching+0x11/0x50 ? exc_page_fault+0x70/0x1c0 ? asm_exc_page_fault+0x27/0x30 ? tcp_tso_segs+0x14/0xa0 tcp_write_xmit+0x67/0xce0 __tcp_push_pending_frames+0x32/0xf0 tcp_push+0x107/0x140 tcp_sendmsg_locked+0x99f/0xbb0 tcp_bpf_push+0x19d/0x3a0 tcp_bpf_sendmsg_redir+0x55/0xd0 tcp_bpf_send_verdict+0x407/0x550 tcp_bpf_sendmsg+0x1a1/0x390 inet_sendmsg+0x6a/0x70 sock_sendmsg+0x9d/0xc0 ? sockfd_lookup_light+0x12/0x80 __sys_sendto+0x10e/0x160 ? syscall_enter_from_user_mode+0x20/0x60 ? __this_cpu_preempt_check+0x13/0x20 ? lockdep_hardirqs_on+0x82/0x110 __x64_sys_sendto+0x1f/0x30 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reject selecting a non-TCP sockets as redirect target from a BPF sk_msg program to prevent the crash. When attempted, user will receive an EACCES error from send/sendto/sendmsg() syscall. Fixes: 122e6c79 ("sock_map: Update sock type checks for UDP") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230920102055.42662-1-jakub@cloudflare.com
-
John Fastabend authored
Test that we can read with MSG_F_PEEK and then still get correct number of available bytes through FIONREAD. The recv() (without PEEK) then returns the bytes as expected. The recv() always worked though because it was just the available byte reporting that was broke before latest fixes. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230926035300.135096-4-john.fastabend@gmail.com
-
John Fastabend authored
When data is peek'd off the receive queue we shouldn't considered it copied from tcp_sock side. When we increment copied_seq this will confuse tcp_data_ready() because copied_seq can be arbitrarily increased. From application side it results in poll() operations not waking up when expected. Notice tcp stack without BPF recvmsg programs also does not increment copied_seq. We broke this when we moved copied_seq into recvmsg to only update when actual copy was happening. But, it wasn't working correctly either before because the tcp_data_ready() tried to use the copied_seq value to see if data was read by user yet. See fixes tags. Fixes: e5c6de5f ("bpf, sockmap: Incorrectly handling copied_seq") Fixes: 04919bed ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230926035300.135096-3-john.fastabend@gmail.com
-
John Fastabend authored
Before fix e5c6de5f tcp_read_skb() would increment the tp->copied-seq value. This (as described in the commit) would cause an error for apps because once that is incremented the application might believe there is no data to be read. Then some apps would stall or abort believing no data is available. However, the fix is incomplete because it introduces another issue in the skb dequeue. The loop does tcp_recv_skb() in a while loop to consume as many skbs as possible. The problem is the call is ... tcp_recv_skb(sk, seq, &offset) ... where 'seq' is: u32 seq = tp->copied_seq; Now we can hit a case where we've yet incremented copied_seq from BPF side, but then tcp_recv_skb() fails this test ... if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) ... so that instead of returning the skb we call tcp_eat_recv_skb() which frees the skb. This is because the routine believes the SKB has been collapsed per comment: /* This looks weird, but this can happen if TCP collapsing * splitted a fat GRO packet, while we released socket lock * in skb_splice_bits() */ This can't happen here we've unlinked the full SKB and orphaned it. Anyways it would confuse any BPF programs if the data were suddenly moved underneath it. To fix this situation do simpler operation and just skb_peek() the data of the queue followed by the unlink. It shouldn't need to check this condition and tcp_read_skb() reads entire skbs so there is no need to handle the 'offset!=0' case as we would see in tcp_read_sock(). Fixes: e5c6de5f ("bpf, sockmap: Incorrectly handling copied_seq") Fixes: 04919bed ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230926035300.135096-2-john.fastabend@gmail.com
-
- 20 Sep, 2023 1 commit
-
-
Andrii Nakryiko authored
In mark_chain_precision() logic, when we reach the entry to a global func, it is expected that R1-R5 might be still requested to be marked precise. This would correspond to some integer input arguments being tracked as precise. This is all expected and handled as a special case. What's not expected is that we'll leave backtrack_state structure with some register bits set. This is because for subsequent precision propagations backtrack_state is reused without clearing masks, as all code paths are carefully written in a way to leave empty backtrack_state with zeroed out masks, for speed. The fix is trivial, we always clear register bit in the register mask, and then, optionally, set reg->precise if register is SCALAR_VALUE type. Reported-by: Chris Mason <clm@meta.com> Fixes: be2ef816 ("bpf: allow precision tracking for programs with subprogs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230918210110.2241458-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 19 Sep, 2023 5 commits
-
-
Leon Hwang authored
Fix 'tr' dereferencing bug when CONFIG_BPF_JIT is turned off. When CONFIG_BPF_JIT is turned off, 'bpf_trampoline_get()' returns NULL, which is same as the cases when CONFIG_BPF_JIT is turned on. Closes: https://lore.kernel.org/r/202309131936.5Nc8eUD0-lkp@intel.com/ Fixes: f7b12b6f ("bpf: verifier: refactor check_attach_btf_id()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230917153846.88732-1-hffilwlqm@gmail.com
-
Alexei Starovoitov authored
Song Liu says: ==================== s390/bpf: Fix arch_prepare_bpf_trampoline While working on trampoline, I found s390's arch_prepare_bpf_trampoline returns 0 on success, which breaks struct_ops. However, the CI doesn't catch this issue. Turns out test_progs:bpf_tcp_ca doesn't really test members of a struct_ops are actually called via the trampolines. 1/2 fixes arch_prepare_bpf_trampoline for s390. 2/2 adds a check to test_progs:bpf_tcp_ca to verify bpf_cubic_acked() is indeed called by the trampoline. Without 1/2, this check would fail on s390. ==================== Link: https://lore.kernel.org/r/20230919060258.3237176-1-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Song Liu authored
Test bpf_tcp_ca (in test_progs) checks multiple tcp_congestion_ops. However, there isn't a test that verifies functions in the tcp_congestion_ops is actually called. Add a check to verify that bpf_cubic_acked is actually called during the test. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230919060258.3237176-3-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Song Liu authored
arch_prepare_bpf_trampoline() for s390 currently returns 0 on success. This is not a problem for regular trampoline. However, struct_ops relies on the return value to advance "image" pointer: bpf_struct_ops_map_update_elem() { ... for_each_member(i, t, member) { ... err = bpf_struct_ops_prepare_trampoline(); ... image += err; } } When arch_prepare_bpf_trampoline returns 0 on success, all members of the struct_ops will point to the same trampoline (the last one). Fix this by returning the program size in arch_prepare_bpf_trampoline (on success). This is the same behavior as other architectures. Signed-off-by: Song Liu <song@kernel.org> Fixes: 528eb2cb ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230919060258.3237176-2-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jisheng Zhang authored
commit 133466c3 ("net: stmmac: use per-queue 64 bit statistics where necessary") caused one regression as found by Uwe, the backtrace looks like: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc1-00449-g133466c3-dirty #21 Hardware name: STM32 (Device Tree Support) unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x60/0x90 dump_stack_lvl from register_lock_class+0x98c/0x99c register_lock_class from __lock_acquire+0x74/0x293c __lock_acquire from lock_acquire+0x134/0x398 lock_acquire from stmmac_get_stats64+0x2ac/0x2fc stmmac_get_stats64 from dev_get_stats+0x44/0x130 dev_get_stats from rtnl_fill_stats+0x38/0x120 rtnl_fill_stats from rtnl_fill_ifinfo+0x834/0x17f4 rtnl_fill_ifinfo from rtmsg_ifinfo_build_skb+0xc0/0x144 rtmsg_ifinfo_build_skb from rtmsg_ifinfo+0x50/0x88 rtmsg_ifinfo from __dev_notify_flags+0xc0/0xec __dev_notify_flags from dev_change_flags+0x50/0x5c dev_change_flags from ip_auto_config+0x2f4/0x1260 ip_auto_config from do_one_initcall+0x70/0x35c do_one_initcall from kernel_init_freeable+0x2ac/0x308 kernel_init_freeable from kernel_init+0x1c/0x138 kernel_init from ret_from_fork+0x14/0x2c The reason is the rxq|txq_stats structures are not what expected because stmmac_open() -> __stmmac_open() the structure is overwritten by "memcpy(&priv->dma_conf, dma_conf, sizeof(*dma_conf));" This causes the well initialized syncp member of rxq|txq_stats is overwritten unexpectedly as pointed out by Johannes and Uwe. Fix this issue by moving rxq|txq_stats back to stmmac_extra_stats. For SMP cache friendly, we also mark stmmac_txq_stats and stmmac_rxq_stats as ____cacheline_aligned_in_smp. Fixes: 133466c3 ("net: stmmac: use per-queue 64 bit statistics where necessary") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20230917165328.3403-1-jszhang@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
- 18 Sep, 2023 22 commits
-
-
David S. Miller authored
Peter Lafreniere says: ==================== ax25: Update link for linux-ax25.org http://linux-ax25.org has been down for nearly a year. Its official replacement is https://linux-ax25.in-berlin.de. Update all references to the dead link to its replacement. As the three touched files are in different areas of the tree, this is being sent with one patch per file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Lafreniere authored
http://linux-ax25.org has been down for nearly a year. Its official replacement is https://linux-ax25.in-berlin.de. Change all references to the old site in the ax25 Kconfig to its replacement. Link: https://marc.info/?m=166792551600315Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Lafreniere authored
http://linux-ax25.org has been down for nearly a year. Its official replacement is https://linux-ax25.in-berlin.de. Update all links to the new URL. Link: https://marc.info/?m=166792551600315Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Lafreniere authored
http://linux-ax25.org has been down for nearly a year. Its official replacement is https://linux-ax25.in-berlin.de. Update the documentation to point there instead. And acknowledge that while the linux-hams list isn't entirely dead, it isn't what most would call 'active'. Remove that word. Link: https://marc.info/?m=166792551600315Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Matthieu Baerts says: ==================== mptcp: fix stalled connections Daire reported a few issues with MPTCP where some connections were stalled in different states. Paolo did a great job fixing them. Patch 1 fixes bogus receive window shrinkage with multiple subflows. Due to a race condition and unlucky circumstances, that may lead to TCP-level window shrinkage, and the connection being stalled on the sender end. Patch 2 is a preparation for patch 3 which processes pending subflow errors on close. Without that and under specific circumstances, the MPTCP-level socket might not switch to the CLOSE state and stall. Patch 4 is also a preparation patch for the next one. Patch 5 fixes MPTCP connections not switching to the CLOSE state when all subflows have been closed but no DATA_FIN have been exchanged to explicitly close the MPTCP connection. Now connections in such state will switch to the CLOSE state after a timeout, still allowing the "make-after-break" feature but making sure connections don't stall forever. It will be possible to modify this timeout -- currently matching TCP TIMEWAIT value (60 seconds) -- in a future version. ==================== Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
-
Paolo Abeni authored
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d11 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/430 Fixes: e16163b6 ("mptcp: refactor shutdown and close") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The msk socket uses to different timeout to track close related events and retransmissions. The existing helpers do not indicate clearly which timer they actually touch, making the related code quite confusing. Change the existing helpers name to avoid such confusion. No functional change intended. This patch is linked to the next one ("mptcp: fix dangling connection hang-up"). The two patches are supposed to be backported together. Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
On incoming TCP reset, subflow closing could happen before error propagation. That in turn could cause the socket error being ignored, and a missing socket state transition, as reported by Daire-Byrne. Address the issues explicitly checking for subflow socket error at close time. To avoid code duplication, factor-out of __mptcp_error_report() a new helper implementing the relevant bits. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/429 Fixes: 15cc1045 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
This will simplify the next patch ("mptcp: process pending subflow error on close"). No functional change intended. Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
In case multiple subflows race to update the mptcp-level receive window, the subflow losing the race should use the window value provided by the "winning" subflow to update it's own tcp-level rcv_wnd. To such goal, the current code bogusly uses the mptcp-level rcv_wnd value as observed before the update attempt. On unlucky circumstances that may lead to TCP-level window shrinkage, and stall the other end. Address the issue feeding to the rcv wnd update the correct value. Fixes: f3589be0 ("mptcp: never shrink offered window") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/427Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Gerhard Engleder says: ==================== tsnep: Fixes based on napi.rst Based on the documentation networking/napi.rst some fixes have been done. tsnep driver should be in line with this new documentation after these fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gerhard Engleder authored
According to the NAPI documentation networking/napi.rst, Rx specific APIs like page pool and XDP cannot be used at all when budget is 0. skb Tx processing should happen regardless of the budget. Stop NAPI polling after Tx processing and skip Rx processing if budget is 0. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gerhard Engleder authored
According to the NAPI documentation networking/napi.rst, for the ethtool API a channel is a IRQ/NAPI which services queues of a given type. tsnep uses a single IRQ/NAPI instance for every TX/RX queue pair. Therefore, combined channels shall be returned instead of separate tx/rx channels. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gerhard Engleder authored
According to the NAPI documentation networking/napi.rst, drivers which have to mask interrupts explicitly should use the napi_schedule_prep() and __napi_schedule() calls. No problem seen so far with current implementation. Nevertheless, let's align the implementation with documentation. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Sebastian Andrzej Siewior says: ==================== net: hsr: Properly parse HSRv1 supervisor frames. this is a follow-up to https://lore.kernel.org/all/20230825153111.228768-1-lukma@denx.de/ replacing https://lore.kernel.org/all/20230914124731.1654059-1-lukma@denx.de/ by grabing/ adding tags and reposting with a commit message plus a missing __packed to a struct (#2) plus extending the testsuite to sover HSRv1 which is what broke here (#3-#5). HSRv0 is (was) not affected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Andrzej Siewior authored
The testsuite already has simply tests for HSRv0. The testuite would have been able to notice the v1 breakage if it was there at the time. Extend the testsuite to also cover HSRv1. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Andrzej Siewior authored
Move the code and group into functions so it will be easier to extend the test to HSRv1 so that both versions are covered. Move the ping/test part into do_complete_ping_test() and the interface setup into setup_hsr_interfaces(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Andrzej Siewior authored
The timeout in the while loop is never subtracted due wrong usage of `let' leading to an endless loop if the former condition never gets true. Put the statement for let in quotes so it is parsed as a single statement. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Andrzej Siewior authored
Struct hsr_sup_tlv describes HW layout and therefore it needs a __packed attribute to ensure the compiler does not add any padding. Due to the size and __packed attribute of the structs that use hsr_sup_tlv it has no functional impact. Add __packed to struct hsr_sup_tlv. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lukasz Majewski authored
While adding support for parsing the redbox supervision frames, the author added `pull_size' and `total_pull_size' to track the amount of bytes that were pulled from the skb during while parsing the skb so it can be reverted/ pushed back at the end. In the process probably copy&paste error occurred and for the HSRv1 case the ethhdr was used instead of the hsr_tag. Later the hsr_tag was used instead of hsr_sup_tag. The later error didn't matter because both structs have the size so HSRv0 was still working. It broke however HSRv1 parsing because struct ethhdr is larger than struct hsr_tag. Reinstate the old pulling flow and pull first ethhdr, hsr_tag in v1 case followed by hsr_sup_tag. [bigeasy: commit message] Fixes: eafaa88b ("net: hsr: Add support for redbox supervision frames")' Suggested-by: Tristram.Ha@microchip.com Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
dh->dccph_x is the 9th byte (offset 8) in "struct dccp_hdr", not in the "byte 7" as Jann claimed. We need to make sure the ICMP messages are big enough, using more standard ways (no more assumptions). syzbot reported: BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2667 [inline] BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2681 [inline] BUG: KMSAN: uninit-value in dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94 pskb_may_pull_reason include/linux/skbuff.h:2667 [inline] pskb_may_pull include/linux/skbuff.h:2681 [inline] dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94 icmpv6_notify+0x4c7/0x880 net/ipv6/icmp.c:867 icmpv6_rcv+0x19d5/0x30d0 ip6_protocol_deliver_rcu+0xda6/0x2a60 net/ipv6/ip6_input.c:438 ip6_input_finish net/ipv6/ip6_input.c:483 [inline] NF_HOOK include/linux/netfilter.h:304 [inline] ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492 ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x5db/0x870 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:304 [inline] ipv6_rcv+0xda/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5523 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5637 netif_receive_skb_internal net/core/dev.c:5723 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5782 tun_rx_batched+0x83b/0x920 tun_get_user+0x564c/0x6940 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:1985 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x15c0 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Uninit was created at: slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:559 __alloc_skb+0x318/0x740 net/core/skbuff.c:650 alloc_skb include/linux/skbuff.h:1286 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6313 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2795 tun_alloc_skb drivers/net/tun.c:1531 [inline] tun_get_user+0x23cf/0x6940 drivers/net/tun.c:1846 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:1985 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x15c0 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd CPU: 0 PID: 4995 Comm: syz-executor153 Not tainted 6.6.0-rc1-syzkaller-00014-ga747acc0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Fixes: 977ad86c ("dccp: Fix out of bounds access in DCCP error handler") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jann Horn <jannh@google.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johnathan Mantey authored
Report the carrier/no-carrier state for the network interface shared between the BMC and the passthrough channel. Without this functionality the BMC is unable to reconfigure the NIC in the event of a re-cabling to a different subnet. Signed-off-by: Johnathan Mantey <johnathanx.mantey@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 Sep, 2023 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-David S. Miller authored
queue Tony Nguyen says: ==================== This series contains updates to iavf and i40e drivers. Radoslaw prevents admin queue operations being added when the driver is being removed for iavf. Petr Oros immediately starts reconfiguration on changes to VLANs on iavf. Ivan Vecera moves reset of VF to occur after port VLAN values are set on i40e. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Nothing prevents iscsi_sw_tcp_conn_bind() to receive file descriptor pointing to non TCP socket (af_unix for example). Return -EINVAL if this is attempted, instead of crashing the kernel. Fixes: 7ba24713 ("[SCSI] open-iscsi/linux-iscsi-5 Initiator: Initiator code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Chris Leech <cleech@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: open-iscsi@googlegroups.com Cc: linux-scsi@vger.kernel.org Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kyle Zeng authored
Currently, we assume the skb is associated with a device before calling __ip_options_compile, which is not always the case if it is re-routed by ipvs. When skb->dev is NULL, dev_net(skb->dev) will become null-dereference. This patch adds a check for the edge case and switch to use the net_device from the rtable when skb->dev is NULL. Fixes: ed0de45a ("ipv4: recompile ip options in ipv4_link_failure") Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Cc: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Sep, 2023 2 commits
-
-
Vinicius Costa Gomes authored
When an XDP redirect happens before the link is ready, that transmission will not finish and will timeout, causing an adapter reset. If the redirects do not stop, the adapter will not stop resetting. Wait for the driver to signal that there's a carrier before allowing transmissions to proceed. Previous code was relying that when __IGC_DOWN is cleared, the NIC is ready to transmit as all the queues are ready, what happens is that the carrier presence will only be signaled later, after the watchdog workqueue has a chance to run. And during this interval (between clearing __IGC_DOWN and the watchdog running) if any transmission happens the timeout is emitted (detected by igc_tx_timeout()) which causes the reset, with the potential for the infinite loop. Fixes: 4ff32036 ("igc: Add support for XDP_REDIRECT action") Reported-by: Ferenc Fejes <ferenc.fejes@ericsson.com> Closes: https://lore.kernel.org/netdev/0caf33cf6adb3a5bf137eeaa20e89b167c9986d5.camel@ericsson.com/Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Ferenc Fejes <ferenc.fejes@ericsson.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Shevchenko authored
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them. It is less verbose and it improves the type checking and semantic. While at it, add missing header inclusion (should be bitops.h, but with the above change it becomes bitmap.h). Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230911154534.4174265-1-andriy.shevchenko@linux.intel.comSigned-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-