- 11 Oct, 2023 10 commits
-
-
Arkadiusz Bokowy authored
When the vhci device is opened in the two-step way, i.e.: open device then write a vendor packet with requested controller type, the device shall respond with a vendor packet which includes HCI index of created interface. When the virtual HCI is created, the host sends a reset request to the controller. This request is processed by the vhci_send_frame() function. However, this request is send by a different thread, so it might happen that this HCI request will be received before the vendor response is queued in the read queue. This results in the HCI vendor response and HCI reset request inversion in the read queue which leads to improper behavior of btvirt: > dmesg [1754256.640122] Bluetooth: MGMT ver 1.22 [1754263.023806] Bluetooth: MGMT ver 1.22 [1754265.043775] Bluetooth: hci1: Opcode 0x c03 failed: -110 In order to synchronize vhci two-step open/setup process with virtual HCI initialization, this patch adds internal lock when queuing data in the vhci_send_frame() function. Signed-off-by: Arkadiusz Bokowy <arkadiusz.bokowy@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-
Nils Hoppmann authored
SMC_STAT_PAYLOAD_SUB(_smc_stats, _tech, key, _len, _rc) will calculate wrong bucket positions for payloads of exactly 4096 bytes and (1 << (m + 12)) bytes, with m == SMC_BUF_MAX - 1. Intended bucket distribution: Assume l == size of payload, m == SMC_BUF_MAX - 1. Bucket 0 : 0 < l <= 2^13 Bucket n, 1 <= n <= m-1 : 2^(n+12) < l <= 2^(n+13) Bucket m : l > 2^(m+12) Current solution: _pos = fls64((l) >> 13) [...] _pos = (_pos < m) ? ((l == 1 << (_pos + 12)) ? _pos - 1 : _pos) : m For l == 4096, _pos == -1, but should be _pos == 0. For l == (1 << (m + 12)), _pos == m, but should be _pos == m - 1. In order to avoid special treatment of these corner cases, the calculation is adjusted. The new solution first subtracts the length by one, and then calculates the correct bucket by shifting accordingly, i.e. _pos = fls64((l - 1) >> 13), l > 0. This not only fixes the issues named above, but also makes the whole bucket assignment easier to follow. Same is done for SMC_STAT_RMB_SIZE_SUB(_smc_stats, _tech, k, _len), where the calculation of the bucket position is similar to the one named above. Fixes: e0e4b8fa ("net/smc: Add SMC statistics support") Suggested-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Nils Hoppmann <niho@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yanguo Li authored
When there are CT table entries, and you rmmod nfp, the following events can happen: task1: nfp_net_pci_remove ↓ nfp_flower_stop->(asynchronous)tcf_ct_flow_table_cleanup_work(3) ↓ nfp_zone_table_entry_destroy(1) task2: nfp_fl_ct_handle_nft_flow(2) When the execution order is (1)->(2)->(3), it will crash. Therefore, in the function nfp_fl_ct_del_flow, nf_flow_table_offload_del_cb needs to be executed synchronously. At the same time, in order to solve the deadlock problem and the problem of rtnl_lock sometimes failing, replace rtnl_lock with the private nfp_fl_lock. Fixes: 7cc93d88 ("nfp: flower-ct: remove callback delete deadlock") Cc: stable@vger.kernel.org Signed-off-by: Yanguo Li <yanguo.li@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Javier Carrasco authored
syzbot has found an uninit-value bug triggered by the dm9601 driver [1]. This error happens because the variable res is not updated if the call to dm_read_shared_word returns an error. In this particular case -EPROTO was returned and res stayed uninitialized. This can be avoided by checking the return value of dm_read_shared_word and propagating the error if the read operation failed. [1] https://syzkaller.appspot.com/bug?extid=1f53a30781af65d2c955 Cc: stable@vger.kernel.org Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Reported-and-tested-by: syzbot+1f53a30781af65d2c955@syzkaller.appspotmail.com Acked-by: Peter Korsgaard <peter@korsgaard.com> Fixes: d0374f4f ("USB: Davicom DM9601 usbnet driver") Link: https://lore.kernel.org/r/20231009-topic-dm9601_uninit_mdio_read-v2-1-f2fe39739b6c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf 2023-10-11 We've added 14 non-merge commits during the last 5 day(s) which contain a total of 12 files changed, 398 insertions(+), 104 deletions(-). The main changes are: 1) Fix s390 JIT backchain issues in the trampoline code generation which previously clobbered the caller's backchain, from Ilya Leoshkevich. 2) Fix zero-size allocation warning in xsk sockets when the configured ring size was close to SIZE_MAX, from Andrew Kanner. 3) Fixes for bpf_mprog API that were found when implementing support in the ebpf-go library along with selftests, from Daniel Borkmann and Lorenz Bauer. 4) Fix riscv JIT to properly sign-extend the return register in programs. This fixes various test_progs selftests on riscv, from Björn Töpel. 5) Fix verifier log for async callback return values where the allowed range was displayed incorrectly, from David Vernet. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: s390/bpf: Fix unwinding past the trampoline s390/bpf: Fix clobbering the caller's backchain in the trampoline selftests/bpf: Add testcase for async callback return value failure bpf: Fix verifier log for async callback return values xdp: Fix zero-size allocation warning in xskq_create() riscv, bpf: Track both a0 (RISC-V ABI) and a5 (BPF) return values riscv, bpf: Sign-extend return values selftests/bpf: Make seen_tc* variable tests more robust selftests/bpf: Test query on empty mprog and pass revision into attach selftests/bpf: Adapt assert_mprog_count to always expect 0 count selftests/bpf: Test bpf_mprog query API via libbpf and raw syscall bpf: Refuse unused attributes in bpf_prog_{attach,detach} bpf: Handle bpf_mprog_query with NULL entry bpf: Fix BPF_PROG_QUERY last field check ==================== Link: https://lore.kernel.org/r/20231010223610.3984-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kory Maincent authored
A bitset without mask in a _SET request means we want exactly the bits in the bitset to be set. This works correctly for compact format but when verbose format is parsed, ethnl_update_bitset32_verbose() only sets the bits present in the request bitset but does not clear the rest. The commit 66991703 fixes this issue by clearing the whole target bitmap before we start iterating. The solution proposed brought an issue with the behavior of the mod variable. As the bitset is always cleared the old val will always differ to the new val. Fix it by adding a new temporary variable which save the state of the old bitmap. Fixes: 66991703 ("ethtool: fix application of verbose no_mask bitset") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231009133645.44503-1-kory.maincent@bootlin.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Merge tag 'linux-can-fixes-for-6.6-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-10-09 Lukas Magel's patch for the CAN ISO-TP protocol fixes the TX state detection and wait behavior. John Watts contributes a patch to only show the sun4i_can Kconfig option on ARCH_SUNXI. A patch by Miquel Raynal fixes the soft-reset workaround for Renesas SoCs in the sja1000 driver. Markus Schneider-Pargmann's patch for the tcan4x5x m_can glue driver fixes the id2 register for the tcan4553. 2 patches by Haibo Chen fix the flexcan stop mode for the imx93 SoC. * tag 'linux-can-fixes-for-6.6-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: tcan4x5x: Fix id2_register for tcan4553 can: flexcan: remove the auto stop mode for IMX93 can: sja1000: Always restart the Tx queue after an overrun arm64: dts: imx93: add the Flex-CAN stop mode by GPR can: sun4i_can: Only show Kconfig if ARCH_SUNXI is set can: isotp: isotp_sendmsg(): fix TX state detection and wait behavior ==================== Link: https://lore.kernel.org/r/20231009085256.693378-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Sili Luo reported a race in nfc_llcp_sock_get(), leading to UAF. Getting a reference on the socket found in a lookup while holding a lock should happen before releasing the lock. nfc_llcp_sock_get_sn() has a similar problem. Finally nfc_llcp_recv_snl() needs to make sure the socket found by nfc_llcp_sock_from_sn() does not disappear. Fixes: 8f50020e ("NFC: LLCP late binding") Reported-by: Sili Luo <rootlab@huawei.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20231009123110.3735515-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jeremy Kerr authored
Our current route lookups (mctp_route_lookup and mctp_route_lookup_null) traverse the net's route list without the RCU read lock held. This means the route lookup is subject to preemption, resulting in an potential grace period expiry, and so an eventual kfree() while we still have the route pointer. Add the proper read-side critical section locks around the route lookups, preventing premption and a possible parallel kfree. The remaining net->mctp.routes accesses are already under a rcu_read_lock, or protected by the RTNL for updates. Based on an analysis from Sili Luo <rootlab@huawei.com>, where introducing a delay in the route lookup could cause a UAF on simultaneous sendmsg() and route deletion. Reported-by: Sili Luo <rootlab@huawei.com> Fixes: 889b7da2 ("mctp: Add initial routing framework") Cc: stable@vger.kernel.org Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/29c4b0e67dc1bf3571df3982de87df90cae9b631.1696837310.git.jk@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Correct punctuation and drop an extraneous word. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231008214121.25940-1-rdunlap@infradead.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 10 Oct, 2023 11 commits
-
-
Ilya Leoshkevich authored
When functions called by the trampoline panic, the backtrace that is printed stops at the trampoline, because the trampoline does not store its caller's frame address (backchain) on stack; it also stores the return address at a wrong location. Store both the same way as is already done for the regular eBPF programs. Fixes: 528eb2cb ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231010203512.385819-3-iii@linux.ibm.com
-
Ilya Leoshkevich authored
One of the first things that s390x kernel functions do is storing the the caller's frame address (backchain) on stack. This makes unwinding possible. The backchain is always stored at frame offset 152, which is inside the 160-byte stack area, that the functions allocate for their callees. The callees must preserve the backchain; the remaining 152 bytes they may use as they please. Currently the trampoline uses all 160 bytes, clobbering the backchain. This causes kernel panics when using __builtin_return_address() in functions called by the trampoline. Fix by reducing the usage of the caller-reserved stack area by 8 bytes in the trampoline. Fixes: 528eb2cb ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Reported-by: Song Liu <song@kernel.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231010203512.385819-2-iii@linux.ibm.com
-
Will Mortensen authored
Commit 1e662209 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") seems to have accidentally inverted the logic added in commit 0bc73ad4 ("net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp"). The impact of this is a little unclear since it seems the FCS scattered with RX-FCS is (usually?) correct regardless. Fixes: 1e662209 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") Tested-by: Charlotte Tan <charlotte@extrahop.com> Reviewed-by: Charlotte Tan <charlotte@extrahop.com> Cc: Adham Faris <afaris@nvidia.com> Cc: Aya Levin <ayal@nvidia.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Moshe Shemesh <moshe@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Will Mortensen <will@extrahop.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231006053706.514618-1-will@extrahop.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Gerd Bayer authored
When the SMC protocol is built into the kernel proper while ISM is configured to be built as module, linking the kernel fails due to unresolved dependencies out of net/smc/smc_ism.o to ism_get_smcd_ops, ism_register_client, and ism_unregister_client as reported via the linux-next test automation (see link). This however is a bug introduced a while ago. Correct the dependency list in ISM's and SMC's Kconfig to reflect the dependencies that are actually inverted. With this you cannot build a kernel with CONFIG_SMC=y and CONFIG_ISM=m. Either ISM needs to be 'y', too - or a 'n'. That way, SMC can still be configured on non-s390 architectures that do not have (nor need) an ISM driver. Fixes: 89e7d2ba ("net/ism: Add new API for client registration") Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/linux-next/d53b5b50-d894-4df8-8969-fd39e63440ae@infradead.org/Co-developed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20231006125847.1517840-1-gbayer@linux.ibm.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Dan Carpenter authored
The adapter->vf_mvs.l list needs to be initialized even if the list is empty. Otherwise it will lead to crashes. Fixes: a1cbb15c ("ixgbe: Add macvlan support for VF") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/ZSADNdIw8zFx1xw2@kadamSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Radu Pirea says: ==================== Add update_pn flag Patches extracted from https://lore.kernel.org/all/20230928084430.1882670-1-radu-nicolae.pirea@oss.nxp.com/ Update_pn flag will let the offloaded MACsec implementations to know when the PN is updated. ==================== Link: https://lore.kernel.org/r/20231005180636.672791-1-radu-nicolae.pirea@oss.nxp.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Radu Pirea (NXP OSS) authored
When updating the SA, use the new update_pn flags instead of comparing the new PN with the initial one. Comparing the initial PN value with the new value will allow the user to update the SA using the initial PN value as a parameter like this: $ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \ ead3664f508eb06c40ac7104cdae4ce5 $ ip macsec set macsec0 tx sa 0 pn 1 off Fixes: 8ff0ac5b ("net/mlx5: Add MACsec offload Tx command support") Fixes: aae3454e ("net/mlx5e: Add MACsec offload Rx command support") Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Radu Pirea (NXP OSS) authored
Updating the PN is not supported. Return -EINVAL if update_pn is true. The following command succeeded, but it should fail because the driver does not update the PN: ip macsec set macsec0 tx sa 0 pn 232 on Fixes: 28c5107a ("net: phy: mscc: macsec support") Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Radu Pirea (NXP OSS) authored
When updating SA, update the PN only when the update_pn flag is true. Otherwise, the PN will be reset to its previous value using the following command and this should not happen: $ ip macsec set macsec0 tx sa 0 on Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Radu Pirea (NXP OSS) authored
Indicate next PN update using update_pn flag in macsec_context. Offloaded MACsec implementations does not know whether or not the MACSEC_SA_ATTR_PN attribute was passed for an SA update and assume that next PN should always updated, but this is not always true. The PN can be reset to its initial value using the following command: $ ip macsec set macsec0 tx sa 0 off #octeontx2-pf case Or, the update PN command will succeed even if the driver does not support PN updates. $ ip macsec set macsec0 tx sa 0 pn 1 on #mscc phy driver case Comparing the initial PN with the new PN value is not a solution. When the user updates the PN using its initial value the command will succeed, even if the driver does not support it. Like this: $ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \ ead3664f508eb06c40ac7104cdae4ce5 $ ip macsec set macsec0 tx sa 0 pn 1 on #mlx5 case Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
syzbot uses panic_on_warn. This means that the skb_dump() I added in the blamed commit are not even called. Rewrite this so that we get the needed skb dump before syzbot crashes. Fixes: eeee4b77 ("net: add more debug info in skb_checksum_help()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231006173355.2254983-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 09 Oct, 2023 5 commits
-
-
David Vernet authored
A previous commit updated the verifier to print an accurate failure message for when someone specifies a nonzero return value from an async callback. This adds a testcase for validating that the verifier emits the correct message in such a case. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231009161414.235829-2-void@manifault.com
-
David Vernet authored
The verifier, as part of check_return_code(), verifies that async callbacks such as from e.g. timers, will return 0. It does this by correctly checking that R0->var_off is in tnum_const(0), which effectively checks that it's in a range of 0. If this condition fails, however, it prints an error message which says that the value should have been in (0x0; 0x1). This results in possibly confusing output such as the following in which an async callback returns 1: At async callback the register R0 has value (0x1; 0x0) should have been in (0x0; 0x1) The fix is easy -- we should just pass the tnum_const(0) as the correct range to verbose_invalid_scalar(), which will then print the following: At async callback the register R0 has value (0x1; 0x0) should have been in (0x0; 0x0) Fixes: bfc6bb74 ("bpf: Implement verifier support for validation of async callbacks.") Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231009161414.235829-1-void@manifault.com
-
Andrew Kanner authored
Syzkaller reported the following issue: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361) Modules linked in: CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12 Hardware name: Generic DT based system unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258) show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680) __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700) warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3)) __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478) vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40) xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286) xsk_setsockopt from __sys_setsockopt (net/socket.c:2308) __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68) xskq_get_ring_size() uses struct_size() macro to safely calculate the size of struct xsk_queue and q->nentries of desc members. But the syzkaller repro was able to set q->nentries with the value initially taken from copy_from_sockptr() high enough to return SIZE_MAX by struct_size(). The next PAGE_ALIGN(size) is such case will overflow the size_t value and set it to 0. This will trigger WARN_ON_ONCE in vmalloc_user() -> __vmalloc_node_range(). The issue is reproducible on 32-bit arm kernel. Fixes: 9f78bf33 ("xsk: support use vaddr as ring") Reported-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000c84b4705fb31741e@google.com/T/ Reported-by: syzbot+b132693e925cbbd89e26@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e20df20606ebab4f@google.com/T/Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://syzkaller.appspot.com/bug?extid=fae676d3cf469331fc89 Link: https://lore.kernel.org/bpf/20231007075148.1759-1-andrew.kanner@gmail.com
-
Björn Töpel authored
The RISC-V BPF uses a5 for BPF return values, which are zero-extended, whereas the RISC-V ABI uses a0 which is sign-extended. In other words, a5 and a0 can differ, and are used in different context. The BPF trampoline are used for both BPF programs, and regular kernel functions. Make sure that the RISC-V BPF trampoline saves, and restores both a0 and a5. Fixes: 49b5e77a ("riscv, bpf: Add bpf trampoline support for RV64") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231004120706.52848-3-bjorn@kernel.org
-
Björn Töpel authored
The RISC-V architecture does not expose sub-registers, and hold all 32-bit values in a sign-extended format [1] [2]: | The compiler and calling convention maintain an invariant that all | 32-bit values are held in a sign-extended format in 64-bit | registers. Even 32-bit unsigned integers extend bit 31 into bits | 63 through 32. Consequently, conversion between unsigned and | signed 32-bit integers is a no-op, as is conversion from a signed | 32-bit integer to a signed 64-bit integer. While BPF, on the other hand, exposes sub-registers, and use zero-extension (similar to arm64/x86). This has led to some subtle bugs, where a BPF JITted program has not sign-extended the a0 register (return value in RISC-V land), passed the return value up the kernel, e.g.: | int from_bpf(void); | | long foo(void) | { | return from_bpf(); | } Here, a0 would be 0xffff_ffff, instead of the expected 0xffff_ffff_ffff_ffff. Internally, the RISC-V JIT uses a5 as a dedicated register for BPF return values. Keep a5 zero-extended, but explicitly sign-extend a0 (which is used outside BPF land). Now that a0 (RISC-V ABI) and a5 (BPF ABI) differs, a0 is only moved to a5 for non-BPF native calls (BPF_PSEUDO_CALL). Fixes: 2353ecc6 ("bpf, riscv: add BPF JIT for RV64G") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-056b6ff-2023-10-02/unpriv-isa-asciidoc.pdf # [2] Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/draft-20230929-e5c800e661a53efe3c2678d71a306323b60eb13b/riscv-abi.pdf # [2] Link: https://lore.kernel.org/bpf/20231004120706.52848-2-bjorn@kernel.org
-
- 08 Oct, 2023 3 commits
-
-
Michal Swiatkowski authored
When one of the LAG interfaces is in switchdev mode, setting default rule can't be done. The interface on which switchdev is running has ice_set_rx_mode() blocked to avoid default rule adding (and other rules). The other interfaces (without switchdev running but connected via bond with interface that runs switchdev) can't follow the same scheme, because rx filtering needs to be disabled when failover happens. Notification for bridge to set promisc mode seems like good place to do that. Fixes: bb52f42a ("ice: Add driver support for firmware changes for LAG") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roger Pau Monne authored
Do not set netback interfaces (vifs) default TX queue size to the ring size. The TX queue size is not related to the ring size, and using the ring size (32) as the queue size can lead to packet drops. Note the TX side of the vif interface in the netback domain is the one receiving packets to be injected to the guest. Do not explicitly set the TX queue length to any value when creating the interface, and instead use the system default. Note that the queue length can also be adjusted at runtime. Fixes: f942dc25 ('xen network backend driver') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
The mlxsw_sp2_nve_vxlan_learning_set() function is supposed to return zero on success or negative error codes. So it needs to be type int instead of bool. Fixes: 4ee70efa ("mlxsw: spectrum_nve: Add support for VXLAN on Spectrum-2") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 07 Oct, 2023 7 commits
-
-
Daniel Borkmann authored
Martin reported that on his local dev machine the test_tc_chain_mixed() fails as "test_tc_chain_mixed:FAIL:seen_tc5 unexpected seen_tc5: actual 1 != expected 0" and others occasionally, too. However, when running in a more isolated setup (qemu in particular), it works fine for him. The reason is that there is a small race-window where seen_tc* could turn into true for various test cases when there is background traffic, e.g. after the asserts they often get reset. In such case when subsequent detach takes place, unrelated background traffic could have already flipped the bool to true beforehand. Add a small helper tc_skel_reset_all_seen() to reset all bools before we do the ping test. At this point, everything is set up as expected and therefore no race can occur. All tc_{opts,links} tests continue to pass after this change. Reported-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-7-daniel@iogearbox.netSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Borkmann authored
Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK #269 tc_opts_query_attach:OK <--- (new test) #270 tc_opts_replace:OK #271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-6-daniel@iogearbox.netSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Borkmann authored
Simplify __assert_mprog_count() to remove the -ENOENT corner case as the bpf_prog_query() now returns 0 when no bpf_mprog is attached. This also allows to convert a few test cases from using raw __assert_mprog_count() over to plain assert_mprog_count() helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-5-daniel@iogearbox.netSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Borkmann authored
Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK <--- (new test) #269 tc_opts_replace:OK #270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-4-daniel@iogearbox.netSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Lorenz Bauer authored
The recently added tcx attachment extended the BPF UAPI for attaching and detaching by a couple of fields. Those fields are currently only supported for tcx, other types like cgroups and flow dissector silently ignore the new fields except for the new flags. This is problematic once we extend bpf_mprog to older attachment types, since it's hard to figure out whether the syscall really was successful if the kernel silently ignores non-zero values. Explicitly reject non-zero fields relevant to bpf_mprog for attachment types which don't use the latter yet. Fixes: e420bed0 ("bpf: Add fd-based tcx multi-prog infra with link support") Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-3-daniel@iogearbox.netSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Borkmann authored
Improve consistency for bpf_mprog_query() API and let the latter also handle a NULL entry as can be the case for tcx. Instead of returning -ENOENT, we copy a count of 0 and revision of 1 to user space, so that this can be fed into a subsequent bpf_mprog_attach() call as expected_revision. A BPF self- test as part of this series has been added to assert this case. Suggested-by: Lorenz Bauer <lmb@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-2-daniel@iogearbox.netSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Borkmann authored
While working on the ebpf-go [0] library integration for bpf_mprog and tcx, Lorenz noticed that two subsequent BPF_PROG_QUERY requests currently fail. A typical workflow is to first gather the bpf_mprog count without passing program/ link arrays, followed by the second request which contains the actual array pointers. The initial call populates count and revision fields. The second call gets rejected due to a BPF_PROG_QUERY_LAST_FIELD bug which should point to query.revision instead of query.link_attach_flags since the former is really the last member. It was not noticed in libbpf as bpf_prog_query_opts() always calls bpf(2) with an on-stack bpf_attr that is memset() each time (and therefore query.revision was reset to zero). [0] https://ebpf-go.dev Fixes: e420bed0 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: Lorenz Bauer <lmb@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-1-daniel@iogearbox.netSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 06 Oct, 2023 4 commits
-
-
Jakub Kicinski authored
Yoshihiro Shimoda says: ==================== ravb: Fix use-after-free issues This patch series fixes use-after-free issues in ravb_remove(). The original patch is made by Zheng Wang [1]. And, I made the patch 1/2 which I found other issue in the ravb_remove(). [1] https://lore.kernel.org/netdev/20230725030026.1664873-1-zyytlz.wz@163.com/ v1: https://lore.kernel.org/all/20231004091253.4194205-1-yoshihiro.shimoda.uh@renesas.com/ ==================== Link: https://lore.kernel.org/r/20231005011201.14368-1-yoshihiro.shimoda.uh@renesas.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yoshihiro Shimoda authored
The ravb_stop() should call cancel_work_sync(). Otherwise, ravb_tx_timeout_work() is possible to use the freed priv after ravb_remove() was called like below: CPU0 CPU1 ravb_tx_timeout() ravb_remove() unregister_netdev() free_netdev(ndev) // free priv ravb_tx_timeout_work() // use priv unregister_netdev() will call .ndo_stop() so that ravb_stop() is called. And, after phy_stop() is called, netif_carrier_off() is also called. So that .ndo_tx_timeout() will not be called after phy_stop(). Fixes: c156633f ("Renesas Ethernet AVB driver proper") Reported-by: Zheng Wang <zyytlz.wz@163.com> Closes: https://lore.kernel.org/netdev/20230725030026.1664873-1-zyytlz.wz@163.com/Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20231005011201.14368-3-yoshihiro.shimoda.uh@renesas.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yoshihiro Shimoda authored
In ravb_remove(), dma_free_coherent() should be call after unregister_netdev(). Otherwise, this controller is possible to use the freed buffer. Fixes: c156633f ("Renesas Ethernet AVB driver proper") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20231005011201.14368-2-yoshihiro.shimoda.uh@renesas.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Moshe Shemesh authored
Devlink health dump get callback should take devlink lock as any other devlink callback. Otherwise, since devlink_mutex was removed, this callback is not protected from a race of the reporter being destroyed while handling the callback. Add devlink lock to the callback and to any call for devlink_health_do_dump(). This should be safe as non of the drivers dump callback implementation takes devlink lock. As devlink lock is added to any callback of dump, the reporter dump_lock is now redundant and can be removed. Fixes: d3efc2a6 ("net: devlink: remove devlink_mutex") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/1696510216-189379-1-git-send-email-moshe@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-