- 13 Mar, 2020 6 commits
-
-
David Howells authored
Fix the handling of signals in client rxrpc calls made by the afs filesystem. Ignore signals completely, leaving call abandonment or connection loss to be detected by timeouts inside AF_RXRPC. Allowing a filesystem call to be interrupted after the entire request has been transmitted and an abort sent means that the server may or may not have done the action - and we don't know. It may even be worse than that for older servers. Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
When an AFS service handler function aborts a call, AF_RXRPC marks the call as complete - which means that it's not going to get any more packets from the receiver. This is a problem because reception of the final ACK is what triggers afs_deliver_to_call() to drop the final ref on the afs_call object. Instead, aborted AFS service calls may then just sit around waiting for ever or until they're displaced by a new call on the same connection channel or a connection-level abort. Fix this by calling afs_set_call_complete() to finalise the afs_call struct representing the call. However, we then need to drop the ref that stops the call from being deallocated. We can do this in afs_set_call_complete(), as the work queue is holding a separate ref of its own, but then we shouldn't do it in afs_process_async_call() and afs_delete_async_call(). call->drop_ref is set to indicate that a ref needs dropping for a call and this is dealt with when we transition a call to AFS_CALL_COMPLETE. But then we also need to get rid of the ref that pins an asynchronous client call. We can do this by the same mechanism, setting call->drop_ref for an async client call too. We can also get rid of call->incoming since nothing ever sets it and only one thing ever checks it (futilely). A trace of the rxrpc_call and afs_call struct ref counting looks like: <idle>-0 [001] ..s5 164.764892: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_new_incoming_call+0x473/0xb34 a=00000000442095b5 <idle>-0 [001] .Ns5 164.766001: rxrpc_call: c=00000002 QUE u=4 sp=rxrpc_propose_ACK+0xbe/0x551 a=00000000442095b5 <idle>-0 [001] .Ns4 164.766005: rxrpc_call: c=00000002 PUT u=3 sp=rxrpc_new_incoming_call+0xa3f/0xb34 a=00000000442095b5 <idle>-0 [001] .Ns7 164.766433: afs_call: c=00000002 WAKE u=2 o=11 sp=rxrpc_notify_socket+0x196/0x33c kworker/1:2-1810 [001] ...1 164.768409: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_process_call+0x25/0x7ae a=00000000442095b5 kworker/1:2-1810 [001] ...1 164.769439: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000001 00000000 02 22 ACK CallAck kworker/1:2-1810 [001] ...1 164.769459: rxrpc_call: c=00000002 PUT u=2 sp=rxrpc_process_call+0x74f/0x7ae a=00000000442095b5 kworker/1:2-1810 [001] ...1 164.770794: afs_call: c=00000002 QUEUE u=3 o=12 sp=afs_deliver_to_call+0x449/0x72c kworker/1:2-1810 [001] ...1 164.770829: afs_call: c=00000002 PUT u=2 o=12 sp=afs_process_async_call+0xdb/0x11e kworker/1:2-1810 [001] ...2 164.771084: rxrpc_abort: c=00000002 95786a88:00000008 s=0 a=1 e=1 K-1 kworker/1:2-1810 [001] ...1 164.771461: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000002 00000000 04 00 ABORT CallAbort kworker/1:2-1810 [001] ...1 164.771466: afs_call: c=00000002 PUT u=1 o=12 sp=SRXAFSCB_ProbeUuid+0xc1/0x106 The abort generated in SRXAFSCB_ProbeUuid(), labelled "K-1", indicates that the local filesystem/cache manager didn't recognise the UUID as its own. Fixes: 2067b2b3 ("afs: Fix the CB.ProbeUuid service handler to reply correctly") Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Fix a couple of tracelines to indicate the usage count after the atomic op, not the usage count before it to be consistent with other afs and rxrpc trace lines. Change the wording of the afs_call_trace_work trace ID label from "WORK" to "QUEUE" to reflect the fact that it's queueing work, not doing work. Fixes: 341f741f ("afs: Refcount the afs_call struct") Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Fix the handling of sendmsg() with MSG_WAITALL for userspace to round the timeout for when a signal occurs up to at least two jiffies as a 1 jiffy timeout may end up being effectively 0 if jiffies wraps at the wrong time. Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Fix the interruptibility of kernel-initiated client calls so that they're either only interruptible when they're waiting for a call slot to come available or they're not interruptible at all. Either way, they're not interruptible during transmission. This should help prevent StoreData calls from being interrupted when writeback is in progress. It doesn't, however, handle interruption during the receive phase. Userspace-initiated calls are still interruptable. After the signal has been handled, sendmsg() will return the amount of data copied out of the buffer and userspace can perform another sendmsg() call to continue transmission. Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Abstract out the calculation of there being sufficient Tx buffer space. This is reproduced several times in the rxrpc sendmsg code. Signed-off-by: David Howells <dhowells@redhat.com>
-
- 12 Mar, 2020 14 commits
-
-
Chris Packham authored
Per the dt-binding the interrupt is optional so use platform_get_irq_optional() instead of platform_get_irq(). Since commit 7723f4c5 ("driver core: platform: Add an error message to platform_get_irq*()") platform_get_irq() produces an error message orion-mdio f1072004.mdio: IRQ index 0 not found which is perfectly normal if one hasn't specified the optional property in the device tree. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Only the bottom 12 bits contain the ATU bin occupancy statistics. The upper bits need masking off. Fixes: e0c69ca7 ("net: dsa: mv88e6xxx: Add ATU occupancy via devlink resources") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Locking newsk while still holding the listener lock triggered a lockdep splat [1] We can simply move the memcg code after we release the listener lock, as this can also help if multiple threads are sharing a common listener. Also fix a typo while reading socket sk_rmem_alloc. [1] WARNING: possible recursive locking detected 5.6.0-rc3-syzkaller #0 Not tainted -------------------------------------------- syz-executor598/9524 is trying to acquire lock: ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline] ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492 but task is already holding lock: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline] ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_INET6); lock(sk_lock-AF_INET6); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by syz-executor598/9524: #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline] #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445 stack backtrace: CPU: 0 PID: 9524 Comm: syz-executor598 Not tainted 5.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_deadlock_bug kernel/locking/lockdep.c:2370 [inline] check_deadlock kernel/locking/lockdep.c:2411 [inline] validate_chain kernel/locking/lockdep.c:2954 [inline] __lock_acquire.cold+0x114/0x288 kernel/locking/lockdep.c:3954 lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4484 lock_sock_nested+0xc5/0x110 net/core/sock.c:2947 lock_sock include/net/sock.h:1541 [inline] inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492 inet_accept+0xe9/0x7c0 net/ipv4/af_inet.c:734 __sys_accept4_file+0x3ac/0x5b0 net/socket.c:1758 __sys_accept4+0x53/0x90 net/socket.c:1809 __do_sys_accept4 net/socket.c:1821 [inline] __se_sys_accept4 net/socket.c:1818 [inline] __x64_sys_accept4+0x93/0xf0 net/socket.c:1818 do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4445c9 Code: e8 0c 0d 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc35b37608 EFLAGS: 00000246 ORIG_RAX: 0000000000000120 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004445c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000306777 R09: 0000000000306777 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00000000004053d0 R14: 0000000000000000 R15: 0000000000000000 Fixes: d752a498 ("net: memcg: late association of sock to memcg") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shakeel Butt <shakeelb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Julian Wiedmann says: ==================== s390/qeth: fixes 2020-03-11 please apply the following patch series for qeth to netdev's net tree. Just one fix to get the RX buffer pool resizing right, with two preparatory cleanups. This is on the larger side given where we are in the -rc cycle, but a big chunk of the delta is just refactoring to make the fix look nice. I intentionally split these off from yesterday's series. No objections if you'd rather punt them to net-next, the series should apply cleanly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
The RX buffer pool is allocated in qeth_alloc_qdio_queues(). A subsequent pool resizing is then handled in a very simple way: first free the current pool, then allocate a new pool of the requested size. There's two ways where this can go wrong: 1. if the resize action happens _before_ the initial pool was allocated, then a subsequent initialization will call qeth_alloc_qdio_queues() and fill the pool with a second(!) set of pages. We consume twice the planned amount of memory. This is easy to fix - just skip the resizing if the queues haven't been allocated yet. 2. if the initial pool was created by qeth_alloc_qdio_queues() but a subsequent resizing fails, then the device has no(!) RX buffer pool. The next initialization will _not_ call qeth_alloc_qdio_queues(), and attempting to back the RX buffers with pages in qeth_init_qdio_queues() will fail. Not very difficult to fix either - instead of re-allocating the whole pool, just allocate/free as many entries to match the desired size. Fixes: 4a71df50 ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
In preparation for a subsequent fix, split out helpers to allocate/free individual pool entries. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
The RX buffer elements are always backed with full pages, reflect this in the pointer type. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Lungaroni authored
The Internet Assigned Numbers Authority (IANA) has recently assigned a protocol number value of 143 for Ethernet [1]. Before this assignment, encapsulation mechanisms such as Segment Routing used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated payload is an Ethernet frame. In this patch, we add the definition of the Ethernet protocol number to the kernel headers and update the SRv6 L2 tunnels to use it. [1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtmlSigned-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it> Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
By default, DSA drivers should configure CPU and DSA ports to their maximum speed. In many configurations this is sufficient to make the link work. In some cases it is necessary to configure the link to run slower, e.g. because of limitations of the SoC it is connected to. Or back to back PHYs are used and the PHY needs to be driven in order to establish link. In this case, phylink is used. Only instantiate phylink if it is required. If there is no PHY, or no fixed link properties, phylink can upset a link which works in the default configuration. Fixes: 0e279218 ("net: dsa: Use PHYLINK for the CPU/DSA ports") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
In one error case, tpacket_rcv drops packets after incrementing the ring producer index. If this happens, it does not update tp_status to TP_STATUS_USER and thus the reader is stalled for an iteration of the ring, causing out of order arrival. The only such error path is when virtio_net_hdr_from_skb fails due to encountering an unknown GSO type. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dominik Czarnota authored
This patch fixes an off-by-one error in strncpy size argument in drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c. The issue is that in: strncmp(opt, "eee_timer:", 6) the passed string literal: "eee_timer:" has 10 bytes (without the NULL byte) and the passed size argument is 6. As a result, the logic will also accept other, malformed strings, e.g. "eee_tiXXX:". This bug doesn't seem to have any security impact since its present in module's cmdline parsing code. Signed-off-by: Dominik Czarnota <dominik.b.czarnota@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amol Grover authored
caifdevs->list is traversed using list_for_each_entry_rcu() outside an RCU read-side critical section but under the protection of rtnl_mutex. Hence, add the corresponding lockdep expression to silence the following false-positive warning: [ 10.868467] ============================= [ 10.869082] WARNING: suspicious RCU usage [ 10.869817] 5.6.0-rc1-00177-g06ec0a154aae4 #1 Not tainted [ 10.870804] ----------------------------- [ 10.871557] net/caif/caif_dev.c:115 RCU-list traversed in non-reader section!! Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Remove Sathya Perla, sathya.perla@broadcom.com is bouncing. The driver has 3 more maintainers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
fec_enet_set_coalesce() validates the previously set params and if they are within range proceeds to apply the new ones. The new ones, however, are not validated. This seems backwards, probably a copy-paste error? Compile tested only. Fixes: d851b47b ("net: fec: add interrupt coalescence feature support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 11 Mar, 2020 6 commits
-
-
Nathan Chancellor authored
Clang warns: drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:2860:9: warning: converting the result of '?:' with integer constants to a boolean always evaluates to 'true' [-Wtautological-constant-compare] return DPAA_FD_DATA_ALIGNMENT ? ALIGN(headroom, ^ drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:131:34: note: expanded from macro 'DPAA_FD_DATA_ALIGNMENT' \#define DPAA_FD_DATA_ALIGNMENT (fman_has_errata_a050385() ? 64 : 16) ^ 1 warning generated. This was exposed by commit 3c68b8ff ("dpaa_eth: FMan erratum A050385 workaround") even though it appears to have been an issue since the introductory commit 9ad1a374 ("dpaa_eth: add support for DPAA Ethernet") since DPAA_FD_DATA_ALIGNMENT has never been able to be zero. Just replace the whole boolean expression with the true branch, as it is always been true. Link: https://github.com/ClangBuiltLinux/linux/issues/928Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'mac80211-for-net-2020-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of fixes: * three netlink validation fixes * a mesh path selection fix ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Cavallari authored
When trying to transmit to an unknown destination, the mesh code would unconditionally transmit a HWMP PREQ even if HWMP is not the current path selection algorithm. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Link: https://lore.kernel.org/r/20200305140409.12204-1-cavallar@lri.frSigned-off-by: Johannes Berg <johannes.berg@intel.com>
-
Jakub Kicinski authored
Add missing attribute validation for NL80211_ATTR_OPER_CLASS to the netlink policy. Fixes: 1057d35e ("cfg80211: introduce TDLS channel switch commands") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20200303051058.4089398-4-kuba@kernel.orgSigned-off-by: Johannes Berg <johannes.berg@intel.com>
-
Jakub Kicinski authored
Add missing attribute validation for beacon report scanning to the netlink policy. Fixes: 1d76250b ("nl80211: support beacon report scanning") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20200303051058.4089398-3-kuba@kernel.orgSigned-off-by: Johannes Berg <johannes.berg@intel.com>
-
Jakub Kicinski authored
Add missing attribute validation for critical protocol fields to the netlink policy. Fixes: 5de17984 ("cfg80211: introduce critical protocol indication from user-space") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20200303051058.4089398-2-kuba@kernel.orgSigned-off-by: Johannes Berg <johannes.berg@intel.com>
-
- 10 Mar, 2020 14 commits
-
-
David S. Miller authored
Julian Wiedmann says: ==================== s390/qeth: fixes 2020-03-10 This fixes three minor issues: 1) a setup parameter gets cleared unnecessarily when the HW config changes, 2) insufficient error handling when initially filling the RX ring, and 3) a rarely used worker that needs to be cancelled during tear down. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
When qeth's napi poll code fails to refill an entirely empty RX ring, it kicks off buffer_reclaim_work to try again later. Make sure that this worker is cancelled when setting the qeth device offline. Otherwise a RX refill action can unexpectedly end up running concurrently to bigger re-configurations (eg. resizing the buffer pool), without any locking. Fixes: b3332930 ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
qeth_init_qdio_queues() fills the RX ring with an initial set of RX buffers. If qeth_init_input_buffer() fails to back one of the RX buffers with memory, we need to bail out and report the error. Fixes: 4a71df50 ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
When an OSA device in prio-queue setup is reduced to 1 TX queue due to HW restrictions, we reset its the default_out_queue to 0. In the old code this was needed so that qeth_get_priority_queue() gets the queue selection right. But with proper multiqueue support we already reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic on txq 0 without even calling .ndo_select_queue. Thus we can preserve the user's configuration, and apply it if the OSA device later re-gains support for multiple TX queues. Fixes: 73dc2daf ("s390/qeth: add TX multiqueue support for OSA devices") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Igor Russkikh says: ==================== MACSec bugfixes related to MAC address change We found out that there's an issue in MACSec code when the MAC address is changed. Both s/w and offloaded implementations don't update SCI when the MAC address changes at the moment, but they should do so, because SCI contains MAC in its first 6 octets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Bogdanov authored
Notify the offload engine about MAC address change to reconfigure it accordingly. Fixes: 3cf3227a ("net: macsec: hardware offloading infrastructure") Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Bogdanov authored
SCI should be updated, because it contains MAC in its first 6 octets. Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Juliet Kim authored
The ibmvnic driver does not check the device state when the device is removed. If the device is removed while a device reset is being processed, the remove may free structures needed by the reset, causing an oops. Fix this by checking the device state before processing device remove. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Karsten Graul authored
During IB device removal, cancel the event worker before the device structure is freed. Fixes: a4cf0443 ("smc: introduce SMC as an IB-client") Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hangbin Liu authored
Rafał found an issue that for non-Ethernet interface, if we down and up frequently, the memory will be consumed slowly. The reason is we add allnodes/allrouters addressed in multicast list in ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up() for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb getting bigger and bigger. The call stack looks like: addrconf_notify(NETDEV_REGISTER) ipv6_add_dev ipv6_dev_mc_inc(ff01::1) ipv6_dev_mc_inc(ff02::1) ipv6_dev_mc_inc(ff02::2) addrconf_notify(NETDEV_UP) addrconf_dev_config /* Alas, we support only Ethernet autoconfiguration. */ return; addrconf_notify(NETDEV_DOWN) addrconf_ifdown ipv6_mc_down igmp6_group_dropped(ff02::2) mld_add_delrec(ff02::2) igmp6_group_dropped(ff02::1) igmp6_group_dropped(ff01::1) After investigating, I can't found a rule to disable multicast on non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM, tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up() in inetdev_event(). Even for IPv6, we don't check the dev type and call ipv6_add_dev(), ipv6_dev_mc_inc() after register device. So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for non-Ethernet interface. v2: Also check IFF_MULTICAST flag to make sure the interface supports multicast Reported-by: Rafał Miłecki <zajec5@gmail.com> Tested-by: Rafał Miłecki <zajec5@gmail.com> Fixes: 74235a25 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels") Fixes: 1666d49e ("mld: do not remove mld souce list info when set link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shakeel Butt authored
If a TCP socket is allocated in IRQ context or cloned from unassociated (i.e. not associated to a memcg) in IRQ context then it will remain unassociated for its whole life. Almost half of the TCPs created on the system are created in IRQ context, so, memory used by such sockets will not be accounted by the memcg. This issue is more widespread in cgroup v1 where network memory accounting is opt-in but it can happen in cgroup v2 if the source socket for the cloning was created in root memcg. To fix the issue, just do the association of the sockets at the accept() time in the process context and then force charge the memory buffer already used and reserved by the socket. Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shakeel Butt authored
We are testing network memory accounting in our setup and noticed inconsistent network memory usage and often unrelated cgroups network usage correlates with testing workload. On further inspection, it seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in irq context specially for cgroup v1. mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context and kind of assumes that this can only happen from sk_clone_lock() and the source sock object has already associated cgroup. However in cgroup v1, where network memory accounting is opt-in, the source sock can be unassociated with any cgroup and the new cloned sock can get associated with unrelated interrupted cgroup. Cgroup v2 can also suffer if the source sock object was created by process in the root cgroup or if sk_alloc() is called in irq context. The fix is to just do nothing in interrupt. WARNING: Please note that about half of the TCP sockets are allocated from the IRQ context, so, memory used by such sockets will not be accouted by the memcg. The stack trace of mem_cgroup_sk_alloc() from IRQ-context: CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1 Hardware name: ... Call Trace: <IRQ> dump_stack+0x57/0x75 mem_cgroup_sk_alloc+0xe9/0xf0 sk_clone_lock+0x2a7/0x420 inet_csk_clone_lock+0x1b/0x110 tcp_create_openreq_child+0x23/0x3b0 tcp_v6_syn_recv_sock+0x88/0x730 tcp_check_req+0x429/0x560 tcp_v6_rcv+0x72d/0xa40 ip6_protocol_deliver_rcu+0xc9/0x400 ip6_input+0x44/0xd0 ? ip6_protocol_deliver_rcu+0x400/0x400 ip6_rcv_finish+0x71/0x80 ipv6_rcv+0x5b/0xe0 ? ip6_sublist_rcv+0x2e0/0x2e0 process_backlog+0x108/0x1e0 net_rx_action+0x26b/0x460 __do_softirq+0x104/0x2a6 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq.part.19+0x40/0x50 __local_bh_enable_ip+0x51/0x60 ip6_finish_output2+0x23d/0x520 ? ip6table_mangle_hook+0x55/0x160 __ip6_finish_output+0xa1/0x100 ip6_finish_output+0x30/0xd0 ip6_output+0x73/0x120 ? __ip6_finish_output+0x100/0x100 ip6_xmit+0x2e3/0x600 ? ipv6_anycast_cleanup+0x50/0x50 ? inet6_csk_route_socket+0x136/0x1e0 ? skb_free_head+0x1e/0x30 inet6_csk_xmit+0x95/0xf0 __tcp_transmit_skb+0x5b4/0xb20 __tcp_send_ack.part.60+0xa3/0x110 tcp_send_ack+0x1d/0x20 tcp_rcv_state_process+0xe64/0xe80 ? tcp_v6_connect+0x5d1/0x5f0 tcp_v6_do_rcv+0x1b1/0x3f0 ? tcp_v6_do_rcv+0x1b1/0x3f0 __release_sock+0x7f/0xd0 release_sock+0x30/0xa0 __inet_stream_connect+0x1c3/0x3b0 ? prepare_to_wait+0xb0/0xb0 inet_stream_connect+0x3b/0x60 __sys_connect+0x101/0x120 ? __sys_getsockopt+0x11b/0x140 __x64_sys_connect+0x1a/0x20 do_syscall_64+0x51/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The stack trace of mem_cgroup_sk_alloc() from IRQ-context: Fixes: 2d758073 ("mm: memcontrol: consolidate cgroup socket tracking") Fixes: d979a39d ("cgroup: duplicate cgroup reference when cloning sockets") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Casey Leedomn <leedom@chelsio.com> is bouncing, Vishal indicated he's happy to take the role. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.open-mesh.org/linux-mergeDavid S. Miller authored
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Don't schedule OGM for disabled interface, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-