• John Fastabend's avatar
    bpf, sockmap: On cleanup we additionally need to remove cached skb · 476d9801
    John Fastabend authored
    Its possible if a socket is closed and the receive thread is under memory
    pressure it may have cached a skb. We need to ensure these skbs are
    free'd along with the normal ingress_skb queue.
    
    Before 799aa7f9 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear
    down and backlog processing both had sock_lock for the common case of
    socket close or unhash. So it was not possible to have both running in
    parrallel so all we would need is the kfree in those kernels.
    
    But, latest kernels include the commit 799aa7f98d5e and this requires a
    bit more work. Without the ingress_lock guarding reading/writing the
    state->skb case its possible the tear down could run before the state
    update causing it to leak memory or worse when the backlog reads the state
    it could potentially run interleaved with the tear down and we might end up
    free'ing the state->skb from tear down side but already have the reference
    from backlog side. To resolve such races we wrap accesses in ingress_lock
    on both sides serializing tear down and backlog case. In both cases this
    only happens after an EAGAIN error case so having an extra lock in place
    is likely fine. The normal path will skip the locks.
    
    Note, we check state->skb before grabbing lock. This works because
    we can only enqueue with the mutex we hold already. Avoiding a race
    on adding state->skb after the check. And if tear down path is running
    that is also fine if the tear down path then removes state->skb we
    will simply set skb=NULL and the subsequent goto is skipped. This
    slight complication avoids locking in normal case.
    
    With this fix we no longer see this warning splat from tcp side on
    socket close when we hit the above case with redirect to ingress self.
    
    [224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220
    [224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi
    [224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G          I       5.14.0-rc1alu+ #181
    [224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
    [224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220
    [224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41
    [224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206
    [224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000
    [224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460
    [224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543
    [224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390
    [224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500
    [224913.935974] FS:  00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000
    [224913.935981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0
    [224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [224913.936007] Call Trace:
    [224913.936016]  inet_csk_destroy_sock+0xba/0x1f0
    [224913.936033]  __tcp_close+0x620/0x790
    [224913.936047]  tcp_close+0x20/0x80
    [224913.936056]  inet_release+0x8f/0xf0
    [224913.936070]  __sock_release+0x72/0x120
    [224913.936083]  sock_close+0x14/0x20
    
    Fixes: a136678c ("bpf: sk_msg, zap ingress queue on psock down")
    Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Acked-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.com
    476d9801
skmsg.c 27.4 KB