1. 29 Sep, 2023 6 commits
    • Daniel Borkmann's avatar
      selftest/bpf: Add various selftests for program limits · d1a783da
      Daniel Borkmann authored
      Add various tests to check maximum number of supported programs
      being attached:
      
        # ./vmtest.sh -- ./test_progs -t tc_opts
        [...]
        ./test_progs -t tc_opts
        [    1.185325] bpf_testmod: loading out-of-tree module taints kernel.
        [    1.186826] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
        [    1.270123] tsc: Refined TSC clocksource calibration: 3407.988 MHz
        [    1.272428] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns
        [    1.276408] clocksource: Switched to clocksource tsc
        #252     tc_opts_after:OK
        #253     tc_opts_append:OK
        #254     tc_opts_basic:OK
        #255     tc_opts_before:OK
        #256     tc_opts_chain_classic:OK
        #257     tc_opts_chain_mixed:OK
        #258     tc_opts_delete_empty:OK
        #259     tc_opts_demixed:OK
        #260     tc_opts_detach:OK
        #261     tc_opts_detach_after:OK
        #262     tc_opts_detach_before:OK
        #263     tc_opts_dev_cleanup:OK
        #264     tc_opts_invalid:OK
        #265     tc_opts_max:OK              <--- (new test)
        #266     tc_opts_mixed:OK
        #267     tc_opts_prepend:OK
        #268     tc_opts_replace:OK
        #269     tc_opts_revision:OK
        Summary: 18/0 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230929204121.20305-2-daniel@iogearbox.net
      d1a783da
    • Daniel Borkmann's avatar
      bpf, mprog: Fix maximum program check on mprog attachment · f9b0e108
      Daniel Borkmann authored
      After Paul's recent improvement to syzkaller to improve coverage for
      bpf_mprog and tcx, it hit a splat that the program limit was surpassed.
      What happened is that the maximum number of progs got added, followed
      by another prog add request which adds with BPF_F_BEFORE flag relative
      to the last program in the array. The idx >= bpf_mprog_max() check in
      bpf_mprog_attach() still passes because the index is below the maximum
      but the maximum will be surpassed. We need to add a check upfront for
      insertions to catch this situation.
      
      Fixes: 053c8e1f ("bpf: Add generic attach/detach/query API for multi-progs")
      Reported-by: syzbot+baa44e3dbbe48e05c1ad@syzkaller.appspotmail.com
      Reported-by: syzbot+b97d20ed568ce0951a06@syzkaller.appspotmail.com
      Reported-by: syzbot+2558ca3567a77b7af4e3@syzkaller.appspotmail.com
      Co-developed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Tested-by: syzbot+baa44e3dbbe48e05c1ad@syzkaller.appspotmail.com
      Tested-by: syzbot+b97d20ed568ce0951a06@syzkaller.appspotmail.com
      Link: https://github.com/google/syzkaller/pull/4207
      Link: https://lore.kernel.org/bpf/20230929204121.20305-1-daniel@iogearbox.net
      f9b0e108
    • Jakub Sitnicki's avatar
      bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets · b80e31ba
      Jakub Sitnicki authored
      With a SOCKMAP/SOCKHASH map and an sk_msg program user can steer messages
      sent from one TCP socket (s1) to actually egress from another TCP
      socket (s2):
      
      tcp_bpf_sendmsg(s1)		// = sk_prot->sendmsg
        tcp_bpf_send_verdict(s1)	// __SK_REDIRECT case
          tcp_bpf_sendmsg_redir(s2)
            tcp_bpf_push_locked(s2)
      	tcp_bpf_push(s2)
      	  tcp_rate_check_app_limited(s2) // expects tcp_sock
      	  tcp_sendmsg_locked(s2)	 // ditto
      
      There is a hard-coded assumption in the call-chain, that the egress
      socket (s2) is a TCP socket.
      
      However in commit 122e6c79 ("sock_map: Update sock type checks for
      UDP") we have enabled redirects to non-TCP sockets. This was done for the
      sake of BPF sk_skb programs. There was no indention to support sk_msg
      send-to-egress use case.
      
      As a result, attempts to send-to-egress through a non-TCP socket lead to a
      crash due to invalid downcast from sock to tcp_sock:
      
       BUG: kernel NULL pointer dereference, address: 000000000000002f
       ...
       Call Trace:
        <TASK>
        ? show_regs+0x60/0x70
        ? __die+0x1f/0x70
        ? page_fault_oops+0x80/0x160
        ? do_user_addr_fault+0x2d7/0x800
        ? rcu_is_watching+0x11/0x50
        ? exc_page_fault+0x70/0x1c0
        ? asm_exc_page_fault+0x27/0x30
        ? tcp_tso_segs+0x14/0xa0
        tcp_write_xmit+0x67/0xce0
        __tcp_push_pending_frames+0x32/0xf0
        tcp_push+0x107/0x140
        tcp_sendmsg_locked+0x99f/0xbb0
        tcp_bpf_push+0x19d/0x3a0
        tcp_bpf_sendmsg_redir+0x55/0xd0
        tcp_bpf_send_verdict+0x407/0x550
        tcp_bpf_sendmsg+0x1a1/0x390
        inet_sendmsg+0x6a/0x70
        sock_sendmsg+0x9d/0xc0
        ? sockfd_lookup_light+0x12/0x80
        __sys_sendto+0x10e/0x160
        ? syscall_enter_from_user_mode+0x20/0x60
        ? __this_cpu_preempt_check+0x13/0x20
        ? lockdep_hardirqs_on+0x82/0x110
        __x64_sys_sendto+0x1f/0x30
        do_syscall_64+0x38/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Reject selecting a non-TCP sockets as redirect target from a BPF sk_msg
      program to prevent the crash. When attempted, user will receive an EACCES
      error from send/sendto/sendmsg() syscall.
      
      Fixes: 122e6c79 ("sock_map: Update sock type checks for UDP")
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20230920102055.42662-1-jakub@cloudflare.com
      b80e31ba
    • John Fastabend's avatar
      bpf, sockmap: Add tests for MSG_F_PEEK · 5f405c0c
      John Fastabend authored
      Test that we can read with MSG_F_PEEK and then still get correct number
      of available bytes through FIONREAD. The recv() (without PEEK) then
      returns the bytes as expected. The recv() always worked though because
      it was just the available byte reporting that was broke before latest
      fixes.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20230926035300.135096-4-john.fastabend@gmail.com
      5f405c0c
    • John Fastabend's avatar
      bpf, sockmap: Do not inc copied_seq when PEEK flag set · da9e915e
      John Fastabend authored
      When data is peek'd off the receive queue we shouldn't considered it
      copied from tcp_sock side. When we increment copied_seq this will confuse
      tcp_data_ready() because copied_seq can be arbitrarily increased. From
      application side it results in poll() operations not waking up when
      expected.
      
      Notice tcp stack without BPF recvmsg programs also does not increment
      copied_seq.
      
      We broke this when we moved copied_seq into recvmsg to only update when
      actual copy was happening. But, it wasn't working correctly either before
      because the tcp_data_ready() tried to use the copied_seq value to see
      if data was read by user yet. See fixes tags.
      
      Fixes: e5c6de5f ("bpf, sockmap: Incorrectly handling copied_seq")
      Fixes: 04919bed ("tcp: Introduce tcp_read_skb()")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20230926035300.135096-3-john.fastabend@gmail.com
      da9e915e
    • John Fastabend's avatar
      bpf: tcp_read_skb needs to pop skb regardless of seq · 9b7177b1
      John Fastabend authored
      Before fix e5c6de5f tcp_read_skb() would increment the tp->copied-seq
      value. This (as described in the commit) would cause an error for apps
      because once that is incremented the application might believe there is no
      data to be read. Then some apps would stall or abort believing no data is
      available.
      
      However, the fix is incomplete because it introduces another issue in
      the skb dequeue. The loop does tcp_recv_skb() in a while loop to consume
      as many skbs as possible. The problem is the call is ...
      
        tcp_recv_skb(sk, seq, &offset)
      
      ... where 'seq' is:
      
        u32 seq = tp->copied_seq;
      
      Now we can hit a case where we've yet incremented copied_seq from BPF side,
      but then tcp_recv_skb() fails this test ...
      
       if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
      
      ... so that instead of returning the skb we call tcp_eat_recv_skb() which
      frees the skb. This is because the routine believes the SKB has been collapsed
      per comment:
      
       /* This looks weird, but this can happen if TCP collapsing
        * splitted a fat GRO packet, while we released socket lock
        * in skb_splice_bits()
        */
      
      This can't happen here we've unlinked the full SKB and orphaned it. Anyways
      it would confuse any BPF programs if the data were suddenly moved underneath
      it.
      
      To fix this situation do simpler operation and just skb_peek() the data
      of the queue followed by the unlink. It shouldn't need to check this
      condition and tcp_read_skb() reads entire skbs so there is no need to
      handle the 'offset!=0' case as we would see in tcp_read_sock().
      
      Fixes: e5c6de5f ("bpf, sockmap: Incorrectly handling copied_seq")
      Fixes: 04919bed ("tcp: Introduce tcp_read_skb()")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20230926035300.135096-2-john.fastabend@gmail.com
      9b7177b1
  2. 20 Sep, 2023 1 commit
  3. 19 Sep, 2023 5 commits
  4. 18 Sep, 2023 22 commits
  5. 17 Sep, 2023 3 commits
  6. 16 Sep, 2023 3 commits