- 10 Aug, 2021 39 commits
-
-
Yang Yingliang authored
I got a memleak report: BUG: memory leak unreferenced object 0x607ee521a658 (size 240): comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s) hex dump (first 32 bytes, cpu 1): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693 [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline] [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611 [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline] [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487 [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457 [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488 [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550 [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504 [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340 [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929 [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline] [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674 [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350 [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404 [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433 [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47 [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae On error path of br_add_if(), p->mcast_stats allocated in new_nbp() need be freed, or it will be leaked. Fixes: 1080ab95 ("net: bridge: add support for IGMP/MLD stats and export them via netlink") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge The blamed commit added a new field to struct switchdev_notifier_fdb_info, but did not make sure that all call paths set it to something valid. For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE notifier, and since the 'is_local' flag is not set, it contains junk from the stack, so the bridge might interpret those notifications as being for local FDB entries when that was not intended. To avoid that now and in the future, zero-initialize all switchdev_notifier_fdb_info structures created by drivers such that all newly added fields to not need to touch drivers again. Fixes: 2c4eca3e ("net: bridge: switchdev: include local flag in FDB notifications") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Nikolay Aleksandrov authored
Ignore fdb flags when adding port extern learn entries and always set BR_FDB_LOCAL flag when adding bridge extern learn entries. This is closest to the behaviour we had before and avoids breaking any use cases which were allowed. This patch fixes iproute2 calls which assume NUD_PERMANENT and were allowed before, example: $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn Extern learn entries are allowed to roam, but do not expire, so static or dynamic flags make no sense for them. Also add a comment for future reference. Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space") Fixes: 0541a629 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Daniel Borkmann says: ==================== bpf 2021-08-10 We've added 5 non-merge commits during the last 2 day(s) which contain a total of 7 files changed, 27 insertions(+), 15 deletions(-). 1) Fix missing bpf_read_lock_trace() context for BPF loader progs, from Yonghong Song. 2) Fix corner case where BPF prog retrieves wrong local storage, also from Yonghong Song. 3) Restrict availability of BPF write_user helper behind lockdown, from Daniel Borkmann. 4) Fix multiple kernel-doc warnings in BPF core, from Randy Dunlap. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, core: Fix kernel-doc notation bpf: Fix potentially incorrect results with bpf_get_local_storage() bpf: Add missing bpf_read_[un]lock_trace() for syscall program bpf: Add lockdown check for probe_write_user helper bpf: Add _kernel suffix to internal lockdown_bpf_read ==================== Link: https://lore.kernel.org/r/20210810144025.22814-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David S. Miller authored
Vladimir Oltean says: ==================== Fix broken backpressure during FDB dump in DSA drivers rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. DSA is one of the few switchdev drivers that have an .ndo_fdb_dump implementation, because of the assumption that the hardware and software FDBs cannot be efficiently kept in sync via SWITCHDEV_FDB_ADD_TO_BRIDGE. Other drivers with a home-cooked .ndo_fdb_dump implementation are ocelot and dpaa2-switch. These appear to do the correct thing, as do the other DSA drivers, so nothing else appears to need fixing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb. Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb. Fixes: 58c59ef9 ("net: dsa: lantiq: Add Forwarding Database access") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb. Fixes: ab335349 ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb. Fixes: e4b27ebc ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Randy Dunlap authored
Fix kernel-doc warnings in kernel/bpf/core.c (found by scripts/kernel-doc and W=1 builds). That is, correct a function name in a comment and add return descriptions for 2 functions. Fixes these kernel-doc warnings: kernel/bpf/core.c:1372: warning: expecting prototype for __bpf_prog_run(). Prototype was for ___bpf_prog_run() instead kernel/bpf/core.c:1372: warning: No description found for return value of '___bpf_prog_run' kernel/bpf/core.c:1883: warning: No description found for return value of 'bpf_prog_select_runtime' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809215229.7556-1-rdunlap@infradead.org
-
Eric Dumazet authored
Fix the data-race reported by syzbot [1] Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count while another change just occured from another context. in_dev->mr_ifc_count is only 8bit wide, so the race had little consequences. [1] BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire write to 0xffff8881051e3062 of 1 bytes by task 12547 on cpu 0: igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821 igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356 ____ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461 __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199 ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218 do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline] ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423 tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362 __sys_setsockopt+0x18f/0x200 net/socket.c:2159 __do_sys_setsockopt net/socket.c:2170 [inline] __se_sys_setsockopt net/socket.c:2167 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff8881051e3062 of 1 bytes by interrupt on cpu 1: igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419 expire_timers+0x135/0x250 kernel/time/timer.c:1464 __run_timers+0x358/0x420 kernel/time/timer.c:1732 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745 __do_softirq+0x12c/0x26e kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646 vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174 vprintk_default+0x22/0x30 kernel/printk/printk.c:2185 vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392 printk+0x62/0x87 kernel/printk/printk.c:2216 selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041 security_netlink_send+0x42/0x90 security/security.c:2070 netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmsg+0x1ed/0x270 net/socket.c:2475 __do_sys_sendmsg net/socket.c:2484 [inline] __se_sys_sendmsg net/socket.c:2482 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x01 -> 0x02 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ben Hutchings says: ==================== ksz8795 VLAN fixes This series fixes a number of bugs in the ksz8795 driver that affect VLAN filtering, tag insertion, and tag removal. I've tested these on the KSZ8795CLXD evaluation board, and checked the register usage against the datasheets for the other supported chips. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
The magic number 4 in VLAN table lookup was the number of entries we can read and write at once. Using phy_port_cnt here doesn't make sense and presumably broke VLAN filtering for 3-port switches. Change it back to 4. Fixes: 4ce2a984 ("net: dsa: microchip: ksz8795: use phy_port_cnt ...") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
Currently ksz8_port_vlan_filtering() sets or clears the VLAN Enable hardware flag. That controls discarding of packets with a VID that has not been enabled for any port on the switch. Since it is a global flag, set the dsa_switch::vlan_filtering_is_global flag so that the DSA core understands this can't be controlled per port. When VLAN filtering is enabled, the switch should also discard packets with a VID that's not enabled on the ingress port. Set or clear each external port's VLAN Ingress Filter flag in ksz8_port_vlan_filtering() to make that happen. Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
On the CPU port, we can support both tagged and untagged VLANs at the same time by doing any necessary untagging in software rather than hardware. To enable that, keep the CPU port's Remove Tag flag cleared and set the dsa_switch::untag_bridge_pvid flag. Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
When a VLAN is deleted from a port, the flags in struct switchdev_obj_port_vlan are always 0. ksz8_port_vlan_del() copies the BRIDGE_VLAN_INFO_UNTAGGED flag to the port's Tag Removal flag, and therefore always clears it. In case there are multiple VLANs configured as untagged on this port - which seems useless, but is allowed - deleting one of them changes the remaining VLANs to be tagged. It's only ever necessary to change this flag when a VLAN is added to the port, so leave it unchanged in ksz8_port_vlan_del(). Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
The switches supported by ksz8795 only have a per-port flag for Tag Removal. This means it is not possible to support both tagged and untagged VLANs on the same port. Reject attempts to add a VLAN that requires the flag to be changed, unless there are no VLANs currently configured. VID 0 is excluded from this check since it is untagged regardless of the state of the flag. On the CPU port we could support tagged and untagged VLANs at the same time. This will be enabled by a later patch. Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
ksz8795 has never actually enabled PVID tag insertion, and it also programmed the PVID incorrectly. To fix this: * Allow tag insertion to be controlled per ingress port. On most chips, set bit 2 in Global Control 19. On KSZ88x3 this control flag doesn't exist. * When adding a PVID: - Set the appropriate register bits to enable tag insertion on egress at every other port if this was the packet's ingress port. - Mask *out* the VID from the default tag, before or-ing in the new PVID. * When removing a PVID: - Clear the same control bits to disable tag insertion. - Don't update the default tag. This wasn't doing anything useful. Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
ksz_read64() currently does some dubious byte-swapping on the two halves of a 64-bit register, and then only returns the high bits. Replace this with a straightforward expression. Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'linux-can-fixes-for-5.14-20210810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can linux-can-fixes-for-5.14-20210810 Marc Kleine-Budde says: ==================== pull-request: can 2021-08-10 this is a pull request of 2 patches for net/master. Baruch Siach's patch fixes a typo for the Microchip CAN BUS Analyzer Tool entry in the MAINTAINERS file. Hussein Alasadi fixes the setting of the M_CAN_DBTP register in the m_can driver. The regression git mainline in v5.14-rc1, so no backport to stable is needed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5 fixes 2021-08-09 This series introduces fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueDavid S. Miller authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-08-09 This series contains updates to ice and iavf drivers. Ani prevents the ice driver from accidentally being probed to a virtual function and stops processing of VF messages when VFs are being torn down. Brett prevents the ice driver from deleting is own MAC address. Fahad ensures the RSS LUT and key are always set following reset for iavf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yonghong Song authored
Commit b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed a bug for bpf_get_local_storage() helper so different tasks won't mess up with each other's percpu local storage. The percpu data contains 8 slots so it can hold up to 8 contexts (same or different tasks), for 8 different program runs, at the same time. This in general is sufficient. But our internal testing showed the following warning multiple times: [...] warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193 __cgroup_bpf_run_filter_sock_ops+0x13e/0x180 RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180 <IRQ> tcp_call_bpf.constprop.99+0x93/0xc0 tcp_conn_request+0x41e/0xa50 ? tcp_rcv_state_process+0x203/0xe00 tcp_rcv_state_process+0x203/0xe00 ? sk_filter_trim_cap+0xbc/0x210 ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160 tcp_v6_do_rcv+0x181/0x3e0 tcp_v6_rcv+0xc65/0xcb0 ip6_protocol_deliver_rcu+0xbd/0x450 ip6_input_finish+0x11/0x20 ip6_input+0xb5/0xc0 ip6_sublist_rcv_finish+0x37/0x50 ip6_sublist_rcv+0x1dc/0x270 ipv6_list_rcv+0x113/0x140 __netif_receive_skb_list_core+0x1a0/0x210 netif_receive_skb_list_internal+0x186/0x2a0 gro_normal_list.part.170+0x19/0x40 napi_complete_done+0x65/0x150 mlx5e_napi_poll+0x1ae/0x680 __napi_poll+0x25/0x120 net_rx_action+0x11e/0x280 __do_softirq+0xbb/0x271 irq_exit_rcu+0x97/0xa0 common_interrupt+0x7f/0xa0 </IRQ> asm_common_interrupt+0x1e/0x40 RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac ? __cgroup_bpf_run_filter_skb+0x378/0x4e0 ? do_softirq+0x34/0x70 ? ip6_finish_output2+0x266/0x590 ? ip6_finish_output+0x66/0xa0 ? ip6_output+0x6c/0x130 ? ip6_xmit+0x279/0x550 ? ip6_dst_check+0x61/0xd0 [...] Using drgn [0] to dump the percpu buffer contents showed that on this CPU slot 0 is still available, but slots 1-7 are occupied and those tasks in slots 1-7 mostly don't exist any more. So we might have issues in bpf_cgroup_storage_unset(). Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset(). Currently, it tries to unset "current" slot with searching from the start. So the following sequence is possible: 1. A task is running and claims slot 0 2. Running BPF program is done, and it checked slot 0 has the "task" and ready to reset it to NULL (not yet). 3. An interrupt happens, another BPF program runs and it claims slot 1 with the *same* task. 4. The unset() in interrupt context releases slot 0 since it matches "task". 5. Interrupt is done, the task in process context reset slot 0. At the end, slot 1 is not reset and the same process can continue to occupy slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF program won't be able to claim an empty slot and a warning will be issued. To fix the issue, for unset() function, we should traverse from the last slot to the first. This way, the above issue can be avoided. The same reverse traversal should also be done in bpf_get_local_storage() helper itself. Otherwise, incorrect local storage may be returned to BPF program. [0] https://github.com/osandov/drgn Fixes: b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
-
Yonghong Song authored
Commit 79a7f8bd ("bpf: Introduce bpf_sys_bpf() helper and program type.") added support for syscall program, which is a sleepable program. But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(), which is needed to ensure proper rcu callback invocations. This patch adds bpf_read_[un]lock_trace() properly. Fixes: 79a7f8bd ("bpf: Introduce bpf_sys_bpf() helper and program type.") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210809235151.1663680-1-yhs@fb.com
-
Daniel Borkmann authored
Back then, commit 96ae5227 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") added the bpf_probe_write_user() helper in order to allow to override user space memory. Its original goal was to have a facility to "debug, divert, and manipulate execution of semi-cooperative processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed since it would otherwise tamper with its integrity. One use case was shown in cf9b1199 ("samples/bpf: Add test/example of using bpf_probe_write_user bpf helper") where the program DNATs traffic at the time of connect(2) syscall, meaning, it rewrites the arguments to a syscall while they're still in userspace, and before the syscall has a chance to copy the argument into kernel space. These days we have better mechanisms in BPF for achieving the same (e.g. for load-balancers), but without having to write to userspace memory. Of course the bpf_probe_write_user() helper can also be used to abuse many other things for both good or bad purpose. Outside of BPF, there is a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and PTRACE_POKE{TEXT,DATA}, but would likely require some more effort. Commit 96ae5227 explicitly dedicated the helper for experimentation purpose only. Thus, move the helper's availability behind a newly added LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under the "integrity" mode. More fine-grained control can be implemented also from LSM side with this change. Fixes: 96ae5227 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
-
Hussein Alasadi authored
This patch fixes the setting of the M_CAN_DBTP register contents: - use DBTP_ (the data bitrate macros) instead of NBTP_ which area used for the nominal bitrate - do not overwrite possibly-existing DBTP_TDC flag by ORing reg_btp instead of overwriting Link: https://lore.kernel.org/r/FRYP281MB06140984ABD9994C0AAF7433D1F69@FRYP281MB0614.DEUP281.PROD.OUTLOOK.COM Fixes: 20779943 ("can: m_can: use bits.h macros for all regmasks") Cc: Torin Cooper-Bennun <torin@maxiluxsystems.com> Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com> Signed-off-by: Hussein Alasadi <alasadi@arecs.eu> [mkl: update patch description, update indention] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Baruch Siach authored
This patch fixes the abbreviated name of the Microchip CAN BUS Analyzer Tool. Fixes: 8a7b46fa ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver") Link: https://lore.kernel.org/r/cc4831cb1c8759c15fb32c21fd326e831183733d.1627876781.git.baruch@tkos.co.ilSigned-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Aya Levin authored
Check return value of mlx5_fw_tracer_start(), set error path and fix return value of mlx5_fw_tracer_init() accordingly. Fixes: c71ad41c ("net/mlx5: FW tracer, events handling") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shay Drory authored
The CQ destroy is performed based on the IRQ number that is stored in cq->irqn. That number wasn't set explicitly during CQ creation and as expected some of the API users of mlx5_core_create_cq() forgot to update it. This caused to wrong synchronization call of the wrong IRQ with a number 0 instead of the real one. As a fix, set the IRQ number directly in the mlx5_core_create_cq() and update all users accordingly. Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Fixes: ef1659ad ("IB/mlx5: Add DEVX support for CQ events") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Chris Mi authored
Free the offload sample action on error. Fixes: f94d6389 ("net/mlx5e: TC, Add support to offload sample action") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shay Drory authored
Destroy pool->mutex when we destroy the pool. Fixes: c36326d3 ("net/mlx5: Round-Robin EQs over IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shay Drory authored
Currently irq->pool is set after the irq is insert to the xarray. Set irq->pool before the irq is inserted to the xarray. Fixes: 71e084e2 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shay Drory authored
Change order of functions in mlx5_irq_detach_nb() so it will be a mirror of mlx5_irq_attach_nb. Fixes: 71e084e2 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Aya Levin authored
Since switchdev mode can't support devlink traps, verify there are no active devlink traps before moving eswitch to switchdev mode. If there are active traps, prevent the switchdev mode configuration. Fixes: eb3862a0 ("net/mlx5e: Enable traps according to link state") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maxim Mikityanskiy authored
mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to free the outstanding descriptors, which relies on mlx5e_page_release_dynamic and page_pool_release_page. However, page_pool_destroy is already called by this point, because mlx5e_close_rq runs before mlx5e_close_xdpsq. This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and mlx5e_close_rq. The commit cited below started calling page_pool_destroy directly from the driver. Previously, the page pool was destroyed under a call_rcu from xdp_rxq_info_unreg_mem_model, which would defer the deallocation until after the XDPSQ is cleaned up. Fixes: 1da4bbef ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Vlad Buslov authored
Ageing time is not converted from clock_t to jiffies which results incorrect ageing timeout calculation in workqueue update task. Fix it by applying clock_t_to_jiffies() to provided value. Fixes: c636a0f0 ("net/mlx5: Bridge, dynamic entry ageing") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
It could be local and remote are on the same machine and the route result will be a local route which will result in creating encap id with src/dst mac address of 0. Fixes: a54e20b4 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Alex Vesker authored
While processing encapsulated packet on RX, one of the fields that is checked is the inner packet length. If the length as specified in the header doesn't match the actual inner packet length, the packet is invalid and should be dropped. However, such packet caused the NIC to hang. This patch turns on a 'fail_on_error' HW bit which allows HW to drop such an invalid packet while processing RX packet and trying to decap it. Fixes: ad17dc8c ("net/mlx5: DR, Move STEv0 action apply logic") Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Leon Romanovsky authored
Clean SF resources if mlx5 eth failed to initialize. Fixes: 1958fc2f ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
- 09 Aug, 2021 1 commit
-
-
Guillaume Nault authored
Data beyond the UDP header might not be part of the skb's linear data. Use skb_copy_bits() instead of direct access to skb->data+X, so that we read the correct bytes even on a fragmented skb. Fixes: 4b5f6723 ("net: Special handling for IP & MPLS.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/7741c46545c6ef02e70c80a9b32814b22d9616b3.1628264975.git.gnault@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-