1. 16 Sep, 2023 7 commits
  2. 15 Sep, 2023 6 commits
  3. 14 Sep, 2023 13 commits
  4. 13 Sep, 2023 1 commit
  5. 12 Sep, 2023 4 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-x64-fix-tailcall-infinite-loop' · 5bbb9e1f
      Alexei Starovoitov authored
      Leon Hwang says:
      
      ====================
      bpf, x64: Fix tailcall infinite loop
      
      This patch series fixes a tailcall infinite loop on x64.
      
      From commit ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall
      handling in JIT"), the tailcall on x64 works better than before.
      
      From commit e411901c ("bpf: allow for tailcalls in BPF subprograms
      for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
      
      From commit 5b92a28a ("bpf: Support attaching tracing BPF program
      to other BPF programs"), BPF program is able to trace other BPF programs.
      
      How about combining them all together?
      
      1. FENTRY/FEXIT on a BPF subprogram.
      2. A tailcall runs in the BPF subprogram.
      3. The tailcall calls the subprogram's caller.
      
      As a result, a tailcall infinite loop comes up. And the loop would halt
      the machine.
      
      As we know, in tail call context, the tail_call_cnt propagates by stack
      and rax register between BPF subprograms. So do in trampolines.
      
      How did I discover the bug?
      
      From commit 7f6e4312 ("bpf: Limit caller's stack depth 256 for
      subprogs with tailcalls"), the total stack size limits to around 8KiB.
      Then, I write some bpf progs to validate the stack consuming, that are
      tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf.
      
      At that time, accidently, I made a tailcall loop. And then the loop halted
      my VM. Without the loop, the bpf progs would consume over 8KiB stack size.
      But the _stack-overflow_ did not halt my VM.
      
      With bpf_printk(), I confirmed that the tailcall count limit did not work
      expectedly. Next, read the code and fix it.
      
      Thank Ilya Leoshkevich, this bug on s390x has been fixed.
      
      Hopefully, this bug on arm64 will be fixed in near future.
      ====================
      
      Link: https://lore.kernel.org/r/20230912150442.2009-1-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5bbb9e1f
    • Leon Hwang's avatar
      selftests/bpf: Add testcases for tailcall infinite loop fixing · e13b5f2f
      Leon Hwang authored
      Add 4 test cases to confirm the tailcall infinite loop bug has been fixed.
      
      Like tailcall_bpf2bpf cases, do fentry/fexit on the bpf2bpf, and then
      check the final count result.
      
      tools/testing/selftests/bpf/test_progs -t tailcalls
      226/13  tailcalls/tailcall_bpf2bpf_fentry:OK
      226/14  tailcalls/tailcall_bpf2bpf_fexit:OK
      226/15  tailcalls/tailcall_bpf2bpf_fentry_fexit:OK
      226/16  tailcalls/tailcall_bpf2bpf_fentry_entry:OK
      226     tailcalls:OK
      Summary: 1/16 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20230912150442.2009-4-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e13b5f2f
    • Leon Hwang's avatar
      bpf, x64: Fix tailcall infinite loop · 2b5dcb31
      Leon Hwang authored
      From commit ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall
      handling in JIT"), the tailcall on x64 works better than before.
      
      From commit e411901c ("bpf: allow for tailcalls in BPF subprograms
      for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
      
      From commit 5b92a28a ("bpf: Support attaching tracing BPF program
      to other BPF programs"), BPF program is able to trace other BPF programs.
      
      How about combining them all together?
      
      1. FENTRY/FEXIT on a BPF subprogram.
      2. A tailcall runs in the BPF subprogram.
      3. The tailcall calls the subprogram's caller.
      
      As a result, a tailcall infinite loop comes up. And the loop would halt
      the machine.
      
      As we know, in tail call context, the tail_call_cnt propagates by stack
      and rax register between BPF subprograms. So do in trampolines.
      
      Fixes: ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
      Fixes: e411901c ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20230912150442.2009-3-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2b5dcb31
    • Leon Hwang's avatar
      bpf, x64: Comment tail_call_cnt initialisation · 2bee9770
      Leon Hwang authored
      Without understanding emit_prologue(), it is really hard to figure out
      where does tail_call_cnt come from, even though searching tail_call_cnt
      in the whole kernel repo.
      
      By adding these comments, it is a little bit easier to understand
      tail_call_cnt initialisation.
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20230912150442.2009-2-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2bee9770
  6. 11 Sep, 2023 1 commit
  7. 09 Sep, 2023 2 commits
  8. 08 Sep, 2023 6 commits
    • Rong Tao's avatar
      a28b1ba2
    • Rong Tao's avatar
      selftests/bpf: trace_helpers.c: Optimize kallsyms cache · c698eaeb
      Rong Tao authored
      Static ksyms often have problems because the number of symbols exceeds the
      MAX_SYMS limit. Like changing the MAX_SYMS from 300000 to 400000 in
      commit e76a0143("selftests/bpf: Bump and validate MAX_SYMS") solves
      the problem somewhat, but it's not the perfect way.
      
      This commit uses dynamic memory allocation, which completely solves the
      problem caused by the limitation of the number of kallsyms. At the same
      time, add APIs:
      
          load_kallsyms_local()
          ksym_search_local()
          ksym_get_addr_local()
          free_kallsyms_local()
      
      There are used to solve the problem of selftests/bpf updating kallsyms
      after attach new symbols during testmod testing.
      Signed-off-by: default avatarRong Tao <rongtao@cestc.cn>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/bpf/tencent_C9BDA68F9221F21BE4081566A55D66A9700A@qq.com
      c698eaeb
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-task_group_seq_get_next-misc-cleanups' · 9bc86925
      Alexei Starovoitov authored
      Oleg Nesterov says:
      
      ====================
      bpf: task_group_seq_get_next: misc cleanups
      
      Yonghong,
      
      I am resending 1-5 of 6 as you suggested with your acks included.
      
      The next (final) patch will change this code to use __next_thread when
      
      	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
      
      is merged.
      
      Oleg.
      ====================
      
      Link: https://lore.kernel.org/r/20230905154612.GA24872@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9bc86925
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-enable-irq-after-irq_work_raise-completes' · 35897c3c
      Alexei Starovoitov authored
      Hou Tao says:
      
      ====================
      bpf: Enable IRQ after irq_work_raise() completes
      
      From: Hou Tao <houtao1@huawei.com>
      
      Hi,
      
      The patchset aims to fix the problem that bpf_mem_alloc() may return
      NULL unexpectedly when multiple bpf_mem_alloc() are invoked concurrently
      under process context and there is still free memory available. The
      problem was found when doing stress test for qp-trie but the same
      problem also exists for bpf_obj_new() as demonstrated in patch #3.
      
      As pointed out by Alexei, the patchset can only fix ENOMEM problem for
      normal process context and can not fix the problem for irq-disabled
      context or RT-enabled kernel.
      
      Patch #1 fixes the race between unit_alloc() and unit_alloc(). Patch #2
      fixes the race between unit_alloc() and unit_free(). And patch #3 adds
      a selftest for the problem. The major change compared with v1 is using
      local_irq_{save,restore)() pair to disable and enable preemption
      instead of preempt_{disable,enable}_notrace pair. The main reason is to
      prevent potential overhead from __preempt_schedule_notrace(). I also
      run htab_mem benchmark and hash_map_perf on a 8-CPUs KVM VM to compare
      the performance between local_irq_{save,restore} and
      preempt_{disable,enable}_notrace(), but the results are similar as shown
      below:
      
      (1) use preempt_{disable,enable}_notrace()
      
      [root@hello bpf]# ./map_perf_test 4 8
      0:hash_map_perf kmalloc 652179 events per sec
      1:hash_map_perf kmalloc 651880 events per sec
      2:hash_map_perf kmalloc 651382 events per sec
      3:hash_map_perf kmalloc 650791 events per sec
      5:hash_map_perf kmalloc 650140 events per sec
      6:hash_map_perf kmalloc 652773 events per sec
      7:hash_map_perf kmalloc 652751 events per sec
      4:hash_map_perf kmalloc 648199 events per sec
      
      [root@hello bpf]# ./benchs/run_bench_htab_mem.sh
      normal bpf ma
      =============
      overwrite            per-prod-op: 110.82 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.73MiB
      batch_add_batch_del  per-prod-op: 89.79 ± 0.75k/s, avg mem: 1.68 ± 0.38MiB, peak mem: 2.73MiB
      add_del_on_diff_cpu  per-prod-op: 17.83 ± 0.07k/s, avg mem: 25.68 ± 2.92MiB, peak mem: 35.10MiB
      
      (2) use local_irq_{save,restore}
      
      [root@hello bpf]# ./map_perf_test 4 8
      0:hash_map_perf kmalloc 656299 events per sec
      1:hash_map_perf kmalloc 656397 events per sec
      2:hash_map_perf kmalloc 656046 events per sec
      3:hash_map_perf kmalloc 655723 events per sec
      5:hash_map_perf kmalloc 655221 events per sec
      4:hash_map_perf kmalloc 654617 events per sec
      6:hash_map_perf kmalloc 650269 events per sec
      7:hash_map_perf kmalloc 653665 events per sec
      
      [root@hello bpf]# ./benchs/run_bench_htab_mem.sh
      normal bpf ma
      =============
      overwrite            per-prod-op: 116.10 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.74MiB
      batch_add_batch_del  per-prod-op: 88.76 ± 0.61k/s, avg mem: 1.94 ± 0.33MiB, peak mem: 2.74MiB
      add_del_on_diff_cpu  per-prod-op: 18.12 ± 0.08k/s, avg mem: 25.10 ± 2.70MiB, peak mem: 34.78MiB
      
      As ususal comments are always welcome.
      
      Change Log:
      v2:
        * Use local_irq_save to disable preemption instead of using
          preempt_{disable,enable}_notrace pair to prevent potential overhead
      
      v1: https://lore.kernel.org/bpf/20230822133807.3198625-1-houtao@huaweicloud.com/
      ====================
      
      Link: https://lore.kernel.org/r/20230901111954.1804721-1-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      35897c3c
    • Oleg Nesterov's avatar
      bpf: task_group_seq_get_next: simplify the "next tid" logic · 780aa8df
      Oleg Nesterov authored
      Kill saved_tid. It looks ugly to update *tid and then restore the
      previous value if __task_pid_nr_ns() returns 0. Change this code
      to update *tid and common->pid_visiting once before return.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230905154656.GA24950@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      780aa8df
    • Hou Tao's avatar
      selftests/bpf: Test preemption between bpf_obj_new() and bpf_obj_drop() · 29c11aa8
      Hou Tao authored
      The test case creates 4 threads and then pins these 4 threads in CPU 0.
      These 4 threads will run different bpf program through
      bpf_prog_test_run_opts() and these bpf program will use bpf_obj_new()
      and bpf_obj_drop() to allocate and free local kptrs concurrently.
      
      Under preemptible kernel, bpf_obj_new() and bpf_obj_drop() may preempt
      each other, bpf_obj_new() may return NULL and the test will fail before
      applying these fixes as shown below:
      
        test_preempted_bpf_ma_op:PASS:open_and_load 0 nsec
        test_preempted_bpf_ma_op:PASS:attach 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:FAIL:ENOMEM unexpected ENOMEM: got TRUE
        #168     preempted_bpf_ma_op:FAIL
        Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20230901111954.1804721-4-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      29c11aa8