- 02 Sep, 2022 8 commits
-
-
Ivan Vecera authored
iavf_reset_task() takes crit_lock at the beginning and holds it during whole call. The function subsequently calls iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when userspace initiates during the reset task any ndo callback that runs under RTNL like iavf_open() because some of that functions tries to take crit_lock. This leads to classic A-B B-A deadlock scenario. To resolve this situation the device should be detached in iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running under RTNL and reattach the device at the end. Fixes: 62fe2a86 ("i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability") Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com> Cc: SlawomirX Laba <slawomirx.laba@intel.com> Tested-by: Vitaly Grinberg <vgrinber@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Ivan Vecera authored
The driver incorrectly frees client instance and subsequent i40e module removal leads to kernel crash. Reproducer: 1. Do ethtool offline test followed immediately by another one host# ethtool -t eth0 offline; ethtool -t eth0 offline 2. Remove recursively irdma module that also removes i40e module host# modprobe -r irdma Result: [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED init_state=2 status=-110 [ 8686.188955] i40e 0000:3d:00.1: i40e_ptp_stop: removed PHC on eno2 [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01 [ 8686.952669] i40e 0000:3d:00.0: i40e_ptp_stop: removed PHC on eno1 [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 8687.768755] #PF: supervisor read access in kernel mode [ 8687.773895] #PF: error_code(0x0000) - not-present page [ 8687.779034] PGD 0 P4D 0 [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G W I 5.19.0+ #2 [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019 [ 8687.805222] RIP: 0010:i40e_lan_del_device+0x13/0xb0 [i40e] [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202 [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000 [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000 [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000 [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0 [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008 [ 8687.870342] FS: 00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000 [ 8687.878427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0 [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8687.905572] PKRU: 55555554 [ 8687.908286] Call Trace: [ 8687.910737] <TASK> [ 8687.912843] i40e_remove+0x2c0/0x330 [i40e] [ 8687.917040] pci_device_remove+0x33/0xa0 [ 8687.920962] device_release_driver_internal+0x1aa/0x230 [ 8687.926188] driver_detach+0x44/0x90 [ 8687.929770] bus_remove_driver+0x55/0xe0 [ 8687.933693] pci_unregister_driver+0x2a/0xb0 [ 8687.937967] i40e_exit_module+0xc/0xf48 [i40e] Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this failure is indicated back to i40e_client_subtask() that calls i40e_client_del_instance() to free client instance referenced by pf->cinst and sets this pointer to NULL. During the module removal i40e_remove() calls i40e_lan_del_device() that dereferences pf->cinst that is NULL -> crash. Do not remove client instance when client open callbacks fails and just clear __I40E_CLIENT_INSTANCE_OPENED bit. The driver also needs to take care about this situation (when netdev is up and client is NOT opened) in i40e_notify_client_of_netdev_close() and calls client close callback only when __I40E_CLIENT_INSTANCE_OPENED is set. Fixes: 0ef2d5af ("i40e: KISS the client interface") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Przemyslaw Patynowski authored
Fix HW rate limiting for ADQ. Fallback to kernel queue selection for ADQ, as it is network stack that decides which queue to use for transmit with ADQ configured. Reset PF after creation of VMDq2 VSIs required for ADQ, as to reprogram TX queue contexts in i40e_configure_tx_ring. Without this patch PF would limit TX rate only according to TC0. Fixes: a9ce82f7 ("i40e: Enable 'channel' mode in mqprio for TC configs") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsDavid S. Miller authored
David Howells says: ==================== rxrpc fixes Here are some fixes for AF_RXRPC: (1) Fix the handling of ICMP/ICMP6 packets. This is a problem due to rxrpc being switched to acting as a UDP tunnel, thereby allowing it to steal the packets before they go through the UDP Rx queue. UDP tunnels can't get ICMP/ICMP6 packets, however. This patch adds an additional encap hook so that they can. (2) Fix the encryption routines in rxkad to handle packets that have more than three parts correctly. The problem is that ->nr_frags doesn't count the initial fragment, so the sglist ends up too short. (3) Fix a problem with destruction of the local endpoint potentially getting repeated. (4) Fix the calculation of the time at which to resend. jiffies_to_usecs() gives microseconds, not nanoseconds. (5) Fix AFS to work out when callback promises and locks expire based on the time an op was issued rather than the time the first reply packet arrives. We don't know how long the server took between calculating the expiry interval and transmitting the reply. (6) Given (5), rxrpc_get_reply_time() is no longer used, so remove it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We got a recent syzbot report [1] showing a possible misuse of pfmemalloc page status in TCP zerocopy paths. Indeed, for pages coming from user space or other layers, using page_is_pfmemalloc() is moot, and possibly could give false positives. There has been attempts to make page_is_pfmemalloc() more robust, but not using it in the first place in this context is probably better, removing cpu cycles. Note to stable teams : You need to backport 84ce071e ("net: introduce __skb_fill_page_desc_noacc") as a prereq. Race is more probable after commit c07aea3e ("mm: add a signature in struct page") because page_is_pfmemalloc() is now using low order bit from page->lru.next, which can change more often than page->index. Low order bit should never be set for lru.next (when used as an anchor in LRU list), so KCSAN report is mostly a false positive. Backporting to older kernel versions seems not necessary. [1] BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0: __list_add include/linux/list.h:73 [inline] list_add include/linux/list.h:88 [inline] lruvec_add_folio include/linux/mm_inline.h:105 [inline] lru_add_fn+0x440/0x520 mm/swap.c:228 folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246 folio_batch_add_and_move mm/swap.c:263 [inline] folio_add_lru+0xf1/0x140 mm/swap.c:490 filemap_add_folio+0xf8/0x150 mm/filemap.c:948 __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981 pagecache_get_page+0x26/0x190 mm/folio-compat.c:104 grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116 ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988 generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738 ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270 ext4_file_write_iter+0x2e3/0x1210 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x468/0x760 fs/read_write.c:578 ksys_write+0xe8/0x1a0 fs/read_write.c:631 __do_sys_write fs/read_write.c:643 [inline] __se_sys_write fs/read_write.c:640 [inline] __x64_sys_write+0x3e/0x50 fs/read_write.c:640 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1: page_is_pfmemalloc include/linux/mm.h:1740 [inline] __skb_fill_page_desc include/linux/skbuff.h:2422 [inline] skb_fill_page_desc include/linux/skbuff.h:2443 [inline] tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018 do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075 tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline] tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 kernel_sendpage+0x184/0x300 net/socket.c:3561 sock_sendpage+0x5a/0x70 net/socket.c:1054 pipe_to_sendpage+0x128/0x160 fs/splice.c:361 splice_from_pipe_feed fs/splice.c:415 [inline] __splice_from_pipe+0x222/0x4d0 fs/splice.c:559 splice_from_pipe fs/splice.c:594 [inline] generic_splice_sendpage+0x89/0xc0 fs/splice.c:743 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:931 splice_direct_to_actor+0x305/0x620 fs/splice.c:886 do_splice_direct+0xfb/0x180 fs/splice.c:974 do_sendfile+0x3bf/0x910 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1317 [inline] __se_sys_sendfile64 fs/read_write.c:1303 [inline] __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000000000 -> 0xffffea0004a1d288 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 Fixes: c07aea3e ("mm: add a signature in struct page") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shakeel Butt <shakeelb@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
There is a shift wrapping bug in this code so anything thing above 31 will return false. Fixes: 35c55c98 ("tipc: add neighbor monitoring framework") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Toke Høiland-Jørgensen authored
The sch_sfb enqueue() routine assumes the skb is still alive after it has been enqueued into a child qdisc, using the data in the skb cb field in the increment_qlen() routine after enqueue. However, the skb may in fact have been freed, causing a use-after-free in this case. In particular, this happens if sch_cake is used as a child of sfb, and the GSO splitting mode of CAKE is enabled (in which case the skb will be split into segments and the original skb freed). Fix this by copying the sfb cb data to the stack before enqueueing the skb, and using this stack copy in increment_qlen() instead of the skb pointer itself. Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231 Fixes: e13e02a3 ("net_sched: SFB flow scheduler") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
This reverts commit 2c87c6f9. Meanwhile it turned out that the following commit is the proper workaround for the issue that 2c87c6f9 tries to address. a3a57bf0 ("net: stmmac: work around sporadic tx issue on link-up") It's nor clear why the to be reverted commit helped for one user, for others it didn't make a difference. Fixes: 2c87c6f9 ("net: phy: meson-gxl: improve link-up behavior") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/8deeeddc-6b71-129b-1918-495a12dc11e3@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 01 Sep, 2022 17 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth, bpf and wireless. Current release - regressions: - bpf: - fix wrong last sg check in sk_msg_recvmsg() - fix kernel BUG in purge_effective_progs() - mac80211: - fix possible leak in ieee80211_tx_control_port() - potential NULL dereference in ieee80211_tx_control_port() Current release - new code bugs: - nfp: fix the access to management firmware hanging Previous releases - regressions: - ip: fix triggering of 'icmp redirect' - sched: tbf: don't call qdisc_put() while holding tree lock - bpf: fix corrupted packets for XDP_SHARED_UMEM - bluetooth: hci_sync: fix suspend performance regression - micrel: fix probe failure Previous releases - always broken: - tcp: make global challenge ack rate limitation per net-ns and default disabled - tg3: fix potential hang-up on system reboot - mac802154: fix reception for no-daddr packets Misc: - r8152: add PID for the lenovo onelink+ dock" * tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits) net/smc: Remove redundant refcount increase Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb" tcp: make global challenge ack rate limitation per net-ns and default disabled tcp: annotate data-race around challenge_timestamp net: dsa: hellcreek: Print warning only once ip: fix triggering of 'icmp redirect' sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb selftests: net: sort .gitignore file Documentation: networking: correct possessive "its" kcm: fix strp_init() order and cleanup mlxbf_gige: compute MDIO period based on i1clk ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler net: lan966x: improve error handle in lan966x_fdma_rx_get_frame() nfp: fix the access to management firmware hanging net: phy: micrel: Make the GPIO to be non-exclusive net: virtio_net: fix notification coalescing comments net/sched: fix netdevice reference leaks in attach_default_qdiscs() net: sched: tbf: don't call qdisc_put() while holding tree lock net: Use u64_stats_fetch_begin_irq() for stats fetch. net: dsa: xrs700x: Use irqsave variant for u64 stats update ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slabLinus Torvalds authored
Pull slab fix from Vlastimil Babka: - A fix from Waiman Long to avoid a theoretical deadlock reported by lockdep. * tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Just handful changes at this time. The only major change is the regression fix about the x86 WC-page buffer allocation. The rest are trivial data-race fixes for ALSA sequencer core, the possible out-of-bounds access fixes in the new ALSA control hash code, and a few device-specific workarounds and fixes" * tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add quirk for LH Labs Geek Out HD Audio 1V5 ALSA: hda/realtek: Add speaker AMP init for Samsung laptops with ALC298 ALSA: control: Re-order bounds checking in get_ctl_id_hash() ALSA: control: Fix an out-of-bounds bug in get_ctl_id_hash() ALSA: hda: intel-nhlt: Correct the handling of fmt_config flexible array ALSA: seq: Fix data-race at module auto-loading ALSA: seq: oss: Fix data-race for max_midi_devs access ALSA: memalloc: Revive x86-specific WC page allocations again
-
David Howells authored
Remove rxrpc_get_reply_time() as that is no longer used now that the call issue time is used instead of the reply time. Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
rxrpc and kafs between them try to use the receive timestamp on the first data packet (ie. the one with sequence number 1) as a base from which to calculate the time at which callback promise and lock expiration occurs. However, we don't know how long it took for the server to send us the reply from it having completed the basic part of the operation - it might then, for instance, have to send a bunch of a callback breaks, depending on the particular operation. Fix this by using the time at which the operation is issued on the client as a base instead. That should never be longer than the server's idea of the expiry time. Fixes: 78107055 ("afs: Fix calculation of callback expiry time") Fixes: 2070a3e4 ("rxrpc: Allow the reply time to be obtained on a client call") Suggested-by: Jeffrey E Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Fix the calculation of the resend age to add a microsecond value as microseconds, not nanoseconds. Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
If the local processor work item for the rxrpc local endpoint gets requeued by an event (such as an incoming packet) between it getting scheduled for destruction and the UDP socket being closed, the rxrpc_local_destroyer() function can get run twice. The second time it can hang because it can end up waiting for cleanup events that will never happen. Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements, but if that isn't sufficient for the number of fragments in the socket buffer, we try to allocate an sglist large enough to hold all the fragments. However, for large packets with a lot of fragments, this isn't sufficient and we need at least one additional fragment. The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then getting returned by userspace. Most of the time, this isn't a problem as rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued together; occasionally, however, the server will ignore the reported limit and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags being 7. skb_to_sgvec() then tries to return a "zeroth" fragment that seems to occur before the fragments counted by ->nr_frags and we hit the end of the sglist too early. Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is recursive up to 24 deep. I'm not sure if I need to take account of that too - or if there's an easy way of counting those frags too. Fix this by counting an extra frag and allocating a larger sglist based on that. Fixes: d0d5c0cd ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: 5271953c ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <dhowells@redhat.com>
-
Waiman Long authored
mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock A circular locking problem is reported by lockdep due to the following circular locking dependency. +--> cpu_hotplug_lock --> slab_mutex --> kn->active --+ | | +-----------------------------------------------------+ The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency happens in kmem_cache_destroy(): cpus_read_lock(); mutex_lock(&slab_mutex); ==> sysfs_slab_unlink() ==> kobject_del() ==> kernfs_remove() ==> __kernfs_remove() ==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...); The backward kn->active ==> cpu_hotplug_lock dependency happens in kernfs_fop_write_iter(): kernfs_get_active(); ==> slab_attr_store() ==> cpu_partial_store() ==> flush_all(): cpus_read_lock() One way to break this circular locking chain is to avoid holding cpu_hotplug_lock and slab_mutex while deleting the kobject in sysfs_slab_unlink() which should be equivalent to doing a write_lock and write_unlock pair of the kn->active virtual lock. Since the kobject structures are not protected by slab_mutex or the cpu_hotplug_lock, we can certainly release those locks before doing the delete operation. Move sysfs_slab_unlink() and sysfs_slab_release() to the newly created kmem_cache_release() and call it outside the slab_mutex & cpu_hotplug_lock critical sections. There will be a slight delay in the deletion of sysfs files if kmem_cache_release() is called indirectly from a work function. Fixes: 5a836bf6 ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: David Rientjes <rientjes@google.com> Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
-
Yacan Liu authored
For passive connections, the refcount increment has been done in smc_clcsock_accept()-->smc_sock_alloc(). Fixes: 3b2dec26 ("net/smc: restructure client and server code in af_smc") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jakub Kicinski authored
This reverts commit 90fabae8. Patch was applied hastily, revert and let the v2 be reviewed. Fixes: 90fabae8 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb") Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Eric Dumazet says: ==================== tcp: tcp challenge ack fixes syzbot found a typical data-race addressed in the first patch. While we are at it, second patch makes the global rate limit per net-ns and disabled by default. ==================== Link: https://lore.kernel.org/r/20220830185656.268523-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Because per host rate limiting has been proven problematic (side channel attacks can be based on it), per host rate limiting of challenge acks ideally should be per netns and turned off by default. This is a long due followup of following commits: 083ae308 ("tcp: enable per-socket rate limiting of all 'challenge acks'") f2b2c582 ("tcp: mitigate ACK loops for connections as tcp_sock") 75ff39cc ("tcp: make challenge acks less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
challenge_timestamp can be read an written by concurrent threads. This was expected, but we need to annotate the race to avoid potential issues. Following patch moves challenge_timestamp and challenge_count to per-netns storage to provide better isolation. Fixes: 354e4aa3 ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kurt Kanzenbach authored
In case the source port cannot be decoded, print the warning only once. This still brings attention to the user and does not spam the logs at the same time. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Nicolas Dichtel authored
__mkroute_input() uses fib_validate_source() to trigger an icmp redirect. My understanding is that fib_validate_source() is used to know if the src address and the gateway address are on the same link. For that, fib_validate_source() returns 1 (same link) or 0 (not the same network). __mkroute_input() is the only user of these positive values, all other callers only look if the returned value is negative. Since the below patch, fib_validate_source() didn't return anymore 1 when both addresses are on the same network, because the route lookup returns RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. Let's adapat the test to return 1 again when both addresses are on the same link. CC: stable@vger.kernel.org Fixes: 747c1430 ("ip: fix dflt addr selection for connected nexthop") Reported-by: kernel test robot <yujie.liu@intel.com> Reported-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 31 Aug, 2022 15 commits
-
-
Jann Horn authored
anon_vma->degree tracks the combined number of child anon_vmas and VMAs that use the anon_vma as their ->anon_vma. anon_vma_clone() then assumes that for any anon_vma attached to src->anon_vma_chain other than src->anon_vma, it is impossible for it to be a leaf node of the VMA tree, meaning that for such VMAs ->degree is elevated by 1 because of a child anon_vma, meaning that if ->degree equals 1 there are no VMAs that use the anon_vma as their ->anon_vma. This assumption is wrong because the ->degree optimization leads to leaf nodes being abandoned on anon_vma_clone() - an existing anon_vma is reused and no new parent-child relationship is created. So it is possible to reuse an anon_vma for one VMA while it is still tied to another VMA. This is an issue because is_mergeable_anon_vma() and its callers assume that if two VMAs have the same ->anon_vma, the list of anon_vmas attached to the VMAs is guaranteed to be the same. When this assumption is violated, vma_merge() can merge pages into a VMA that is not attached to the corresponding anon_vma, leading to dangling page->mapping pointers that will be dereferenced during rmap walks. Fix it by separately tracking the number of child anon_vmas and the number of VMAs using the anon_vma as their ->anon_vma. Fixes: 7a3ef208 ("mm: prevent endless growth of anon_vma hierarchy") Cc: stable@kernel.org Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Toke Høiland-Jørgensen authored
When the GSO splitting feature of sch_cake is enabled, GSO superpackets will be broken up and the resulting segments enqueued in place of the original skb. In this case, CAKE calls consume_skb() on the original skb, but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into assuming the original skb still exists, when it really has been freed. Fix this by adding the __NET_XMIT_STOLEN flag to the return value in this case. Fixes: 0c850344 ("sch_cake: Conditionally split GSO segments") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231 Link: https://lore.kernel.org/r/20220831092103.442868-1-toke@toke.dkSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Axel Rasmussen authored
This is the result of `sort tools/testing/selftests/net/.gitignore`, but preserving the comment at the top. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Link: https://lore.kernel.org/r/20220829184748.1535580-1-axelrasmussen@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Change occurrences of "it's" that are possessive to "its" so that they don't read as "it is". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220829235414.17110-1-rdunlap@infradead.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Cong Wang authored
strp_init() is called just a few lines above this csk->sk_user_data check, it also initializes strp->work etc., therefore, it is unnecessary to call strp_done() to cancel the freshly initialized work. And if sk_user_data is already used by KCM, psock->strp should not be touched, particularly strp->work state, so we need to move strp_init() after the csk->sk_user_data check. This also makes a lockdep warning reported by syzbot go away. Reported-and-tested-by: syzbot+9fc084a4348493ef65d2@syzkaller.appspotmail.com Reported-by: syzbot+e696806ef96cdd2d87cd@syzkaller.appspotmail.com Fixes: e5571240 ("kcm: Check if sk_user_data already set in kcm_attach") Fixes: dff8baa2 ("kcm: Call strp_stop before strp_done in kcm_attach") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220827181314.193710-1-xiyou.wangcong@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David Thompson authored
This patch adds logic to compute the MDIO period based on the i1clk, and thereafter write the MDIO period into the YU MDIO config register. The i1clk resource from the ACPI table is used to provide addressing to YU bootrecord PLL registers. The values in these registers are used to compute MDIO period. If the i1clk resource is not present in the ACPI table, then the current default hardcorded value of 430Mhz is used. The i1clk clock value of 430MHz is only accurate for boards with BF2 mid bin and main bin SoCs. The BF2 high bin SoCs have i1clk = 500MHz, but can support a slower MDIO period. Fixes: f92e1869 ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20220826155916.12491-1-davthompson@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Linus Torvalds authored
Merge tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache/cachefiles fixes from David Howells: - Fix kdoc on fscache_use/unuse_cookie(). - Fix the error returned by cachefiles_ondemand_copen() from an upcall result. - Fix the distribution of requests in on-demand mode in cachefiles to be fairer by cycling through them rather than picking the one with the lowest ID each time (IDs being reused). * tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: cachefiles: make on-demand request distribution fairer cachefiles: fix error return code in cachefiles_ondemand_copen() fscache: fix misdocumented parameter
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID fixes from Jiri Kosina: - NULL pointer dereference fix for Steam driver (Lee Jones) - memory leak fix for hidraw (Karthik Alapati) - regression fix for functionality of some UCLogic tables (Benjamin Tissoires) - a few new device IDs and device-specific quirks * tag 'for-linus-2022083101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: nintendo: fix rumble worker null pointer deref HID: intel-ish-hid: ipc: Add Meteor Lake PCI device ID HID: input: fix uclogic tablets HID: Add Apple Touchbar on T2 Macs in hid_have_special_driver list HID: add Lenovo Yoga C630 battery quirk HID: AMD_SFH: Add a DMI quirk entry for Chromebooks HID: thrustmaster: Add sparco wheel and fix array length hid: intel-ish-hid: ishtp: Fix ishtp client sending disordered message HID: ishtp-hid-clientHID: ishtp-hid-client: Fix comment typo HID: asus: ROG NKey: Ignore portion of 0x5a report HID: hidraw: fix memory leak in hidraw_release() HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fix from Herbert Xu: "Fix a boot performance regression due to an unnecessary dependency on XOR_BLOCKS" * tag 'v6.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: lib - remove unneeded selection of XOR_BLOCKS
-
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsmLinus Torvalds authored
Pull LSM support for IORING_OP_URING_CMD from Paul Moore: "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD. These are necessary as without them the IORING_OP_URING_CMD remains outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch, and my SELinux patch). They have been discussed at length with the io_uring folks, and Jens has given his thumbs-up on the relevant patches (see the commit descriptions). There is one patch that is not strictly necessary, but it makes testing much easier and is very trivial: the /dev/null IORING_OP_URING_CMD patch." * tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: Smack: Provide read control for io_uring_cmd /dev/null: add IORING_OP_URING_CMD support selinux: implement the security_uring_cmd() LSM hook lsm,io_uring: add LSM hooks for the new uring_cmd file op
-
Xin Yin authored
For now, enqueuing and dequeuing on-demand requests all start from idx 0, this makes request distribution unfair. In the weighty concurrent I/O scenario, the request stored in higher idx will starve. Searching requests cyclically in cachefiles_ondemand_daemon_read, makes distribution fairer. Fixes: c8383054 ("cachefiles: notify the user daemon when looking up cookie") Reported-by: Yongqing Li <liyongqing@bytedance.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220817065200.11543-1-yinxin.x@bytedance.com/ # v1 Link: https://lore.kernel.org/r/20220825020945.2293-1-yinxin.x@bytedance.com/ # v2
-
Sun Ke authored
The cache_size field of copen is specified by the user daemon. If cache_size < 0, then the OPEN request is expected to fail, while copen itself shall succeed. However, returning 0 is indeed unexpected when cache_size is an invalid error code. Fix this by returning error when cache_size is an invalid error code. Changes ======= v4: update the code suggested by Dan v3: update the commit log suggested by Jingbo. Fixes: c8383054 ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Sun Ke <sunke32@huawei.com> Suggested-by: Jeffle Xu <jefflexu@linux.alibaba.com> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220818111935.1683062-1-sunke32@huawei.com/ # v2 Link: https://lore.kernel.org/r/20220818125038.2247720-1-sunke32@huawei.com/ # v3 Link: https://lore.kernel.org/r/20220826023515.3437469-1-sunke32@huawei.com/ # v4
-
Khalid Masum authored
This patch fixes two warnings generated by make docs. The functions fscache_use_cookie and fscache_unuse_cookie, both have a parameter named cookie. But they are documented with the name "object" with unclear description. Which generates the warning when creating docs. This commit will replace the currently misdocumented parameter names with the correct ones while adding proper descriptions. CC: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20220521142446.4746-1-khalid.masum.92@gmail.com/ # v1 Link: https://lore.kernel.org/r/20220818040738.12036-1-khalid.masum.92@gmail.com/ # v2 Link: https://lore.kernel.org/r/880d7d25753fb326ee17ac08005952112fcf9bdb.1657360984.git.mchehab@kernel.org/ # Mauro's version
-
Duoming Zhou authored
The function neigh_timer_handler() is a timer handler that runs in an atomic context. When used by rocker, neigh_timer_handler() calls "kzalloc(.., GFP_KERNEL)" that may sleep. As a result, the sleep in atomic context bug will happen. One of the processes is shown below: ofdpa_fib4_add() ... neigh_add_timer() (wait a timer) neigh_timer_handler() neigh_release() neigh_destroy() rocker_port_neigh_destroy() rocker_world_port_neigh_destroy() ofdpa_port_neigh_destroy() ofdpa_port_ipv4_neigh() kzalloc(sizeof(.., GFP_KERNEL) //may sleep This patch changes the gfp_t parameter of kzalloc() from GFP_KERNEL to GFP_ATOMIC in order to mitigate the bug. Fixes: 00fc0c51 ("rocker: Change world_ops API and implementation to be switchdev independant") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
Don't just print a warning. Clean up and return an error as well. Fixes: c8349639 ("net: lan966x: Add FDMA functionality") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/YwjgDm/SVd5c1tQU@kiliSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-