• John Fastabend's avatar
    bpf, sockmap: RCU splat with redirect and strparser error or TLS · 93dd5f18
    John Fastabend authored
    There are two paths to generate the below RCU splat the first and
    most obvious is the result of the BPF verdict program issuing a
    redirect on a TLS socket (This is the splat shown below). Unlike
    the non-TLS case the caller of the *strp_read() hooks does not
    wrap the call in a rcu_read_lock/unlock. Then if the BPF program
    issues a redirect action we hit the RCU splat.
    
    However, in the non-TLS socket case the splat appears to be
    relatively rare, because the skmsg caller into the strp_data_ready()
    is wrapped in a rcu_read_lock/unlock. Shown here,
    
     static void sk_psock_strp_data_ready(struct sock *sk)
     {
    	struct sk_psock *psock;
    
    	rcu_read_lock();
    	psock = sk_psock(sk);
    	if (likely(psock)) {
    		if (tls_sw_has_ctx_rx(sk)) {
    			psock->parser.saved_data_ready(sk);
    		} else {
    			write_lock_bh(&sk->sk_callback_lock);
    			strp_data_ready(&psock->parser.strp);
    			write_unlock_bh(&sk->sk_callback_lock);
    		}
    	}
    	rcu_read_unlock();
     }
    
    If the above was the only way to run the verdict program we
    would be safe. But, there is a case where the strparser may throw an
    ENOMEM error while parsing the skb. This is a result of a failed
    skb_clone, or alloc_skb_for_msg while building a new merged skb when
    the msg length needed spans multiple skbs. This will in turn put the
    skb on the strp_wrk workqueue in the strparser code. The skb will
    later be dequeued and verdict programs run, but now from a
    different context without the rcu_read_lock()/unlock() critical
    section in sk_psock_strp_data_ready() shown above. In practice
    I have not seen this yet, because as far as I know most users of the
    verdict programs are also only working on single skbs. In this case no
    merge happens which could trigger the above ENOMEM errors. In addition
    the system would need to be under memory pressure. For example, we
    can't hit the above case in selftests because we missed having tests
    to merge skbs. (Added in later patch)
    
    To fix the below splat extend the rcu_read_lock/unnlock block to
    include the call to sk_psock_tls_verdict_apply(). This will fix both
    TLS redirect case and non-TLS redirect+error case. Also remove
    psock from the sk_psock_tls_verdict_apply() function signature its
    not used there.
    
    [ 1095.937597] WARNING: suspicious RCU usage
    [ 1095.940964] 5.7.0-rc7-02911-g463bac5f #1 Tainted: G        W
    [ 1095.944363] -----------------------------
    [ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage!
    [ 1095.950866]
    [ 1095.950866] other info that might help us debug this:
    [ 1095.950866]
    [ 1095.957146]
    [ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1
    [ 1095.961482] 1 lock held by test_sockmap/15970:
    [ 1095.964501]  #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls]
    [ 1095.968568]
    [ 1095.968568] stack backtrace:
    [ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G        W         5.7.0-rc7-02911-g463bac5f #1
    [ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    [ 1095.980519] Call Trace:
    [ 1095.982191]  dump_stack+0x8f/0xd0
    [ 1095.984040]  sk_psock_skb_redirect+0xa6/0xf0
    [ 1095.986073]  sk_psock_tls_strp_read+0x1d8/0x250
    [ 1095.988095]  tls_sw_recvmsg+0x714/0x840 [tls]
    
    v2: Improve commit message to identify non-TLS redirect plus error case
        condition as well as more common TLS case. In the process I decided
        doing the rcu_read_unlock followed by the lock/unlock inside branches
        was unnecessarily complex. We can just extend the current rcu block
        and get the same effeective without the shuffling and branching.
        Thanks Martin!
    
    Fixes: e91de6af ("bpf: Fix running sk_skb program types with ktls")
    Reported-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
    Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
    Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Acked-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
    Link: https://lore.kernel.org/bpf/159312677907.18340.11064813152758406626.stgit@john-XPS-13-9370
    93dd5f18
skmsg.c 20.1 KB