• Michal Luczaj's avatar
    af_unix: Fix garbage collector racing against connect() · 47d8ac01
    Michal Luczaj authored
    Garbage collector does not take into account the risk of embryo getting
    enqueued during the garbage collection. If such embryo has a peer that
    carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
    different set of children. Leading to an incorrectly elevated inflight
    count, and then a dangling pointer within the gc_inflight_list.
    sockets are AF_UNIX/SOCK_STREAM
    S is an unconnected socket
    L is a listening in-flight socket bound to addr, not in fdtable
    V's fd will be passed via sendmsg(), gets inflight count bumped
    connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
    ----------------	-------------------------	-----------
    NS = unix_create1()
    skb1 = sock_wmalloc(NS)
    L = unix_find_other(addr)
    unix_peer(S) = NS
    			// V count=1 inflight=0
     			NS = unix_peer(S)
     			skb2 = sock_alloc()
    			skb_queue_tail(NS, skb2[V])
    			// V became in-flight
    			// V count=2 inflight=1
    			// V count=1 inflight=1
    			// GC candidate condition met
    						for u in gc_inflight_list:
    						  if (total_refs == inflight_refs)
    						    add u to gc_candidates
    						// gc_candidates={L, V}
    						for u in gc_candidates:
    						  scan_children(u, dec_inflight)
    						// embryo (skb1) was not
    						// reachable from L yet, so V's
    						// inflight remains unchanged
    __skb_queue_tail(L, skb1)
    						for u in gc_candidates:
    						  if (u.inflight)
    						    scan_children(u, inc_inflight_move_tail)
    						// V count=1 inflight=2 (!)
    If there is a GC-candidate listening socket, lock/unlock its state. This
    makes GC wait until the end of any ongoing connect() to that socket. After
    flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
    there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
    this point, unix_inflight() can not happen because unix_gc_lock is already
    taken. Inflight graph remains unaffected.
    Fixes: 1fd05ba5 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
    Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
    Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.coSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
garbage.c 11.4 KB