• Kuniyuki Iwashima's avatar
    tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK. · d4f2c86b
    Kuniyuki Iwashima authored
    This patch also changes the code to call reuseport_migrate_sock() and
    inet_reqsk_clone(), but unlike the other cases, we do not call
    inet_reqsk_clone() right after reuseport_migrate_sock().
    
    Currently, in the receive path for TCP_NEW_SYN_RECV sockets, its listener
    has three kinds of refcnt:
    
      (A) for listener itself
      (B) carried by reuqest_sock
      (C) sock_hold() in tcp_v[46]_rcv()
    
    While processing the req, (A) may disappear by close(listener). Also, (B)
    can disappear by accept(listener) once we put the req into the accept
    queue. So, we have to hold another refcnt (C) for the listener to prevent
    use-after-free.
    
    For socket migration, we call reuseport_migrate_sock() to select a listener
    with (A) and to increment the new listener's refcnt in tcp_v[46]_rcv().
    This refcnt corresponds to (C) and is cleaned up later in tcp_v[46]_rcv().
    Thus we have to take another refcnt (B) for the newly cloned request_sock.
    
    In inet_csk_complete_hashdance(), we hold the count (B), clone the req, and
    try to put the new req into the accept queue. By migrating req after
    winning the "own_req" race, we can avoid such a worst situation:
    
      CPU 1 looks up req1
      CPU 2 looks up req1, unhashes it, then CPU 1 loses the race
      CPU 3 looks up req2, unhashes it, then CPU 2 loses the race
      ...
    Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20210612123224.12525-8-kuniyu@amazon.co.jp
    d4f2c86b
tcp_minisocks.c 26.7 KB