1. 15 Sep, 2023 2 commits
  2. 14 Sep, 2023 13 commits
  3. 13 Sep, 2023 1 commit
  4. 12 Sep, 2023 4 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-x64-fix-tailcall-infinite-loop' · 5bbb9e1f
      Alexei Starovoitov authored
      Leon Hwang says:
      
      ====================
      bpf, x64: Fix tailcall infinite loop
      
      This patch series fixes a tailcall infinite loop on x64.
      
      From commit ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall
      handling in JIT"), the tailcall on x64 works better than before.
      
      From commit e411901c ("bpf: allow for tailcalls in BPF subprograms
      for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
      
      From commit 5b92a28a ("bpf: Support attaching tracing BPF program
      to other BPF programs"), BPF program is able to trace other BPF programs.
      
      How about combining them all together?
      
      1. FENTRY/FEXIT on a BPF subprogram.
      2. A tailcall runs in the BPF subprogram.
      3. The tailcall calls the subprogram's caller.
      
      As a result, a tailcall infinite loop comes up. And the loop would halt
      the machine.
      
      As we know, in tail call context, the tail_call_cnt propagates by stack
      and rax register between BPF subprograms. So do in trampolines.
      
      How did I discover the bug?
      
      From commit 7f6e4312 ("bpf: Limit caller's stack depth 256 for
      subprogs with tailcalls"), the total stack size limits to around 8KiB.
      Then, I write some bpf progs to validate the stack consuming, that are
      tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf.
      
      At that time, accidently, I made a tailcall loop. And then the loop halted
      my VM. Without the loop, the bpf progs would consume over 8KiB stack size.
      But the _stack-overflow_ did not halt my VM.
      
      With bpf_printk(), I confirmed that the tailcall count limit did not work
      expectedly. Next, read the code and fix it.
      
      Thank Ilya Leoshkevich, this bug on s390x has been fixed.
      
      Hopefully, this bug on arm64 will be fixed in near future.
      ====================
      
      Link: https://lore.kernel.org/r/20230912150442.2009-1-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5bbb9e1f
    • Leon Hwang's avatar
      selftests/bpf: Add testcases for tailcall infinite loop fixing · e13b5f2f
      Leon Hwang authored
      Add 4 test cases to confirm the tailcall infinite loop bug has been fixed.
      
      Like tailcall_bpf2bpf cases, do fentry/fexit on the bpf2bpf, and then
      check the final count result.
      
      tools/testing/selftests/bpf/test_progs -t tailcalls
      226/13  tailcalls/tailcall_bpf2bpf_fentry:OK
      226/14  tailcalls/tailcall_bpf2bpf_fexit:OK
      226/15  tailcalls/tailcall_bpf2bpf_fentry_fexit:OK
      226/16  tailcalls/tailcall_bpf2bpf_fentry_entry:OK
      226     tailcalls:OK
      Summary: 1/16 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20230912150442.2009-4-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e13b5f2f
    • Leon Hwang's avatar
      bpf, x64: Fix tailcall infinite loop · 2b5dcb31
      Leon Hwang authored
      From commit ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall
      handling in JIT"), the tailcall on x64 works better than before.
      
      From commit e411901c ("bpf: allow for tailcalls in BPF subprograms
      for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
      
      From commit 5b92a28a ("bpf: Support attaching tracing BPF program
      to other BPF programs"), BPF program is able to trace other BPF programs.
      
      How about combining them all together?
      
      1. FENTRY/FEXIT on a BPF subprogram.
      2. A tailcall runs in the BPF subprogram.
      3. The tailcall calls the subprogram's caller.
      
      As a result, a tailcall infinite loop comes up. And the loop would halt
      the machine.
      
      As we know, in tail call context, the tail_call_cnt propagates by stack
      and rax register between BPF subprograms. So do in trampolines.
      
      Fixes: ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
      Fixes: e411901c ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20230912150442.2009-3-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2b5dcb31
    • Leon Hwang's avatar
      bpf, x64: Comment tail_call_cnt initialisation · 2bee9770
      Leon Hwang authored
      Without understanding emit_prologue(), it is really hard to figure out
      where does tail_call_cnt come from, even though searching tail_call_cnt
      in the whole kernel repo.
      
      By adding these comments, it is a little bit easier to understand
      tail_call_cnt initialisation.
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20230912150442.2009-2-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2bee9770
  5. 11 Sep, 2023 1 commit
  6. 09 Sep, 2023 2 commits
  7. 08 Sep, 2023 17 commits
    • Rong Tao's avatar
      a28b1ba2
    • Rong Tao's avatar
      selftests/bpf: trace_helpers.c: Optimize kallsyms cache · c698eaeb
      Rong Tao authored
      Static ksyms often have problems because the number of symbols exceeds the
      MAX_SYMS limit. Like changing the MAX_SYMS from 300000 to 400000 in
      commit e76a0143("selftests/bpf: Bump and validate MAX_SYMS") solves
      the problem somewhat, but it's not the perfect way.
      
      This commit uses dynamic memory allocation, which completely solves the
      problem caused by the limitation of the number of kallsyms. At the same
      time, add APIs:
      
          load_kallsyms_local()
          ksym_search_local()
          ksym_get_addr_local()
          free_kallsyms_local()
      
      There are used to solve the problem of selftests/bpf updating kallsyms
      after attach new symbols during testmod testing.
      Signed-off-by: default avatarRong Tao <rongtao@cestc.cn>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/bpf/tencent_C9BDA68F9221F21BE4081566A55D66A9700A@qq.com
      c698eaeb
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-task_group_seq_get_next-misc-cleanups' · 9bc86925
      Alexei Starovoitov authored
      Oleg Nesterov says:
      
      ====================
      bpf: task_group_seq_get_next: misc cleanups
      
      Yonghong,
      
      I am resending 1-5 of 6 as you suggested with your acks included.
      
      The next (final) patch will change this code to use __next_thread when
      
      	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
      
      is merged.
      
      Oleg.
      ====================
      
      Link: https://lore.kernel.org/r/20230905154612.GA24872@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9bc86925
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-enable-irq-after-irq_work_raise-completes' · 35897c3c
      Alexei Starovoitov authored
      Hou Tao says:
      
      ====================
      bpf: Enable IRQ after irq_work_raise() completes
      
      From: Hou Tao <houtao1@huawei.com>
      
      Hi,
      
      The patchset aims to fix the problem that bpf_mem_alloc() may return
      NULL unexpectedly when multiple bpf_mem_alloc() are invoked concurrently
      under process context and there is still free memory available. The
      problem was found when doing stress test for qp-trie but the same
      problem also exists for bpf_obj_new() as demonstrated in patch #3.
      
      As pointed out by Alexei, the patchset can only fix ENOMEM problem for
      normal process context and can not fix the problem for irq-disabled
      context or RT-enabled kernel.
      
      Patch #1 fixes the race between unit_alloc() and unit_alloc(). Patch #2
      fixes the race between unit_alloc() and unit_free(). And patch #3 adds
      a selftest for the problem. The major change compared with v1 is using
      local_irq_{save,restore)() pair to disable and enable preemption
      instead of preempt_{disable,enable}_notrace pair. The main reason is to
      prevent potential overhead from __preempt_schedule_notrace(). I also
      run htab_mem benchmark and hash_map_perf on a 8-CPUs KVM VM to compare
      the performance between local_irq_{save,restore} and
      preempt_{disable,enable}_notrace(), but the results are similar as shown
      below:
      
      (1) use preempt_{disable,enable}_notrace()
      
      [root@hello bpf]# ./map_perf_test 4 8
      0:hash_map_perf kmalloc 652179 events per sec
      1:hash_map_perf kmalloc 651880 events per sec
      2:hash_map_perf kmalloc 651382 events per sec
      3:hash_map_perf kmalloc 650791 events per sec
      5:hash_map_perf kmalloc 650140 events per sec
      6:hash_map_perf kmalloc 652773 events per sec
      7:hash_map_perf kmalloc 652751 events per sec
      4:hash_map_perf kmalloc 648199 events per sec
      
      [root@hello bpf]# ./benchs/run_bench_htab_mem.sh
      normal bpf ma
      =============
      overwrite            per-prod-op: 110.82 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.73MiB
      batch_add_batch_del  per-prod-op: 89.79 ± 0.75k/s, avg mem: 1.68 ± 0.38MiB, peak mem: 2.73MiB
      add_del_on_diff_cpu  per-prod-op: 17.83 ± 0.07k/s, avg mem: 25.68 ± 2.92MiB, peak mem: 35.10MiB
      
      (2) use local_irq_{save,restore}
      
      [root@hello bpf]# ./map_perf_test 4 8
      0:hash_map_perf kmalloc 656299 events per sec
      1:hash_map_perf kmalloc 656397 events per sec
      2:hash_map_perf kmalloc 656046 events per sec
      3:hash_map_perf kmalloc 655723 events per sec
      5:hash_map_perf kmalloc 655221 events per sec
      4:hash_map_perf kmalloc 654617 events per sec
      6:hash_map_perf kmalloc 650269 events per sec
      7:hash_map_perf kmalloc 653665 events per sec
      
      [root@hello bpf]# ./benchs/run_bench_htab_mem.sh
      normal bpf ma
      =============
      overwrite            per-prod-op: 116.10 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.74MiB
      batch_add_batch_del  per-prod-op: 88.76 ± 0.61k/s, avg mem: 1.94 ± 0.33MiB, peak mem: 2.74MiB
      add_del_on_diff_cpu  per-prod-op: 18.12 ± 0.08k/s, avg mem: 25.10 ± 2.70MiB, peak mem: 34.78MiB
      
      As ususal comments are always welcome.
      
      Change Log:
      v2:
        * Use local_irq_save to disable preemption instead of using
          preempt_{disable,enable}_notrace pair to prevent potential overhead
      
      v1: https://lore.kernel.org/bpf/20230822133807.3198625-1-houtao@huaweicloud.com/
      ====================
      
      Link: https://lore.kernel.org/r/20230901111954.1804721-1-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      35897c3c
    • Oleg Nesterov's avatar
      bpf: task_group_seq_get_next: simplify the "next tid" logic · 780aa8df
      Oleg Nesterov authored
      Kill saved_tid. It looks ugly to update *tid and then restore the
      previous value if __task_pid_nr_ns() returns 0. Change this code
      to update *tid and common->pid_visiting once before return.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230905154656.GA24950@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      780aa8df
    • Hou Tao's avatar
      selftests/bpf: Test preemption between bpf_obj_new() and bpf_obj_drop() · 29c11aa8
      Hou Tao authored
      The test case creates 4 threads and then pins these 4 threads in CPU 0.
      These 4 threads will run different bpf program through
      bpf_prog_test_run_opts() and these bpf program will use bpf_obj_new()
      and bpf_obj_drop() to allocate and free local kptrs concurrently.
      
      Under preemptible kernel, bpf_obj_new() and bpf_obj_drop() may preempt
      each other, bpf_obj_new() may return NULL and the test will fail before
      applying these fixes as shown below:
      
        test_preempted_bpf_ma_op:PASS:open_and_load 0 nsec
        test_preempted_bpf_ma_op:PASS:attach 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
        test_preempted_bpf_ma_op:FAIL:ENOMEM unexpected ENOMEM: got TRUE
        #168     preempted_bpf_ma_op:FAIL
        Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20230901111954.1804721-4-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      29c11aa8
    • Oleg Nesterov's avatar
      bpf: task_group_seq_get_next: kill next_task · 0ee9808b
      Oleg Nesterov authored
      It only adds the unnecessary confusion and compicates the "retry" code.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230905154654.GA24945@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0ee9808b
    • Hou Tao's avatar
      bpf: Enable IRQ after irq_work_raise() completes in unit_free{_rcu}() · 62cf51cb
      Hou Tao authored
      Both unit_free() and unit_free_rcu() invoke irq_work_raise() to free
      freed objects back to slab and the invocation may also be preempted by
      unit_alloc() and unit_alloc() may return NULL unexpectedly as shown in
      the following case:
      
      task A         task B
      
      unit_free()
        // high_watermark = 48
        // free_cnt = 49 after free
        irq_work_raise()
          // mark irq work as IRQ_WORK_PENDING
          irq_work_claim()
      
                     // task B preempts task A
                     unit_alloc()
                       // free_cnt = 48 after alloc
      
                     // does unit_alloc() 32-times
      	       ......
      	       // free_cnt = 16
      
      	       unit_alloc()
      	         // free_cnt = 15 after alloc
                       // irq work is already PENDING,
                       // so just return
                       irq_work_raise()
      
      	       // does unit_alloc() 15-times
                     ......
      	       // free_cnt = 0
      
                     unit_alloc()
                       // free_cnt = 0 before alloc
                       return NULL
      
      Fix it by enabling IRQ after irq_work_raise() completes.
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20230901111954.1804721-3-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      62cf51cb
    • Oleg Nesterov's avatar
      bpf: task_group_seq_get_next: fix the skip_if_dup_files check · 87abbf7a
      Oleg Nesterov authored
      Unless I am notally confused it is wrong. We are going to return or
      skip next_task so we need to check next_task-files, not task->files.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230905154651.GA24940@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      87abbf7a
    • Oleg Nesterov's avatar
      bpf: task_group_seq_get_next: cleanup the usage of get/put_task_struct · 49819213
      Oleg Nesterov authored
      get_pid_task() makes no sense, the code does put_task_struct() soon after.
      Use find_task_by_pid_ns() instead of find_pid_ns + get_pid_task and kill
      put_task_struct(), this allows to do get_task_struct() only once before
      return.
      
      While at it, kill the unnecessary "if (!pid)" check in the "if (!*tid)"
      block, this matches the next usage of find_pid_ns() + get_pid_task() in
      this function.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230905154649.GA24935@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      49819213
    • Oleg Nesterov's avatar
      bpf: task_group_seq_get_next: cleanup the usage of next_thread() · 1a00ef57
      Oleg Nesterov authored
      1. find_pid_ns() + get_pid_task() under rcu_read_lock() guarantees that we
         can safely iterate the task->thread_group list. Even if this task exits
         right after get_pid_task() (or goto retry) and pid_alive() returns 0.
      
         Kill the unnecessary pid_alive() check.
      
      2. next_thread() simply can't return NULL, kill the bogus "if (!next_task)"
         check.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230905154646.GA24928@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1a00ef57
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-add-support-for-local-percpu-kptr' · 1e4a6d97
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      bpf: Add support for local percpu kptr
      
      Patch set [1] implemented cgroup local storage BPF_MAP_TYPE_CGRP_STORAGE
      similar to sk/task/inode local storage and old BPF_MAP_TYPE_CGROUP_STORAGE
      map is marked as deprecated since old BPF_MAP_TYPE_CGROUP_STORAGE map can
      only work with current cgroup.
      
      Similarly, the existing BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map
      is a percpu version of BPF_MAP_TYPE_CGROUP_STORAGE and only works
      with current cgroup. But there is no replacement which can work
      with arbitrary cgroup.
      
      This patch set solved this problem but adding support for local
      percpu kptr. The map value can have a percpu kptr field which holds
      a bpf prog allocated percpu data. The below is an example,
      
        struct percpu_val_t {
          ... fields ...
        }
      
        struct map_value_t {
          struct percpu_val_t __percpu_kptr *percpu_data_ptr;
        }
      
      In the above, 'map_value_t' is the map value type for a
      BPF_MAP_TYPE_CGRP_STORAGE map. User can access 'percpu_data_ptr'
      and then read/write percpu data. This covers BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
      and more. So BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map type
      is marked as deprecated.
      
      In additional, local percpu kptr supports the same map type
      as other kptrs including hash, lru_hash, array, sk/inode/task/cgrp
      local storage. Currently, percpu data structure does not support
      non-scalars or special fields (e.g., bpf_spin_lock, bpf_rb_root, etc.).
      They can be supported in the future if there exist use cases.
      
      Please for individual patches for details.
      
        [1] https://lore.kernel.org/all/20221026042835.672317-1-yhs@fb.com/
      
      Changelog:
        v2 -> v3:
          - fix libbpf_str test failure.
        v1 -> v2:
          - does not support special fields in percpu data structure.
          - rename __percpu attr to __percpu_kptr attr.
          - rename BPF_KPTR_PERCPU_REF to BPF_KPTR_PERCPU.
          - better code to handle bpf_{this,per}_cpu_ptr() helpers.
          - add more negative tests.
          - fix a bpftool related test failure.
      ====================
      
      Link: https://lore.kernel.org/r/20230827152729.1995219-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1e4a6d97
    • Hou Tao's avatar
      bpf: Enable IRQ after irq_work_raise() completes in unit_alloc() · 566f6de3
      Hou Tao authored
      When doing stress test for qp-trie, bpf_mem_alloc() returned NULL
      unexpectedly because all qp-trie operations were initiated from
      bpf syscalls and there was still available free memory. bpf_obj_new()
      has the same problem as shown by the following selftest.
      
      The failure is due to the preemption. irq_work_raise() will invoke
      irq_work_claim() first to mark the irq work as pending and then inovke
      __irq_work_queue_local() to raise an IPI. So when the current task
      which is invoking irq_work_raise() is preempted by other task,
      unit_alloc() may return NULL for preemption task as shown below:
      
      task A         task B
      
      unit_alloc()
        // low_watermark = 32
        // free_cnt = 31 after alloc
        irq_work_raise()
          // mark irq work as IRQ_WORK_PENDING
          irq_work_claim()
      
      	       // task B preempts task A
      	       unit_alloc()
      	         // free_cnt = 30 after alloc
      	         // irq work is already PENDING,
      	         // so just return
      	         irq_work_raise()
      	       // does unit_alloc() 30-times
      	       ......
      	       unit_alloc()
      	         // free_cnt = 0 before alloc
      	         return NULL
      
      Fix it by enabling IRQ after irq_work_raise() completes. An alternative
      fix is using preempt_{disable|enable}_notrace() pair, but it may have
      extra overhead. Another feasible fix is to only disable preemption or
      IRQ before invoking irq_work_queue() and enable preemption or IRQ after
      the invocation completes, but it can't handle the case when
      c->low_watermark is 1.
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20230901111954.1804721-2-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      566f6de3
    • Yonghong Song's avatar
      bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated · 9bc95a95
      Yonghong Song authored
      Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr'
      can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
      and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
      Also make changes in selftests/bpf/test_bpftool_synctypes.py
      and selftest libbpf_str to fix otherwise test errors.
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9bc95a95
    • Yonghong Song's avatar
      selftests/bpf: Add some negative tests · 1bd79317
      Yonghong Song authored
      Add a few negative tests for common mistakes with using percpu kptr
      including:
        - store to percpu kptr.
        - type mistach in bpf_kptr_xchg arguments.
        - sleepable prog with untrusted arg for bpf_this_cpu_ptr().
        - bpf_percpu_obj_new && bpf_obj_drop, and bpf_obj_new && bpf_percpu_obj_drop
        - struct with ptr for bpf_percpu_obj_new
        - struct with special field (e.g., bpf_spin_lock) for bpf_percpu_obj_new
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230827152832.2002421-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1bd79317
    • Yonghong Song's avatar
      selftests/bpf: Add tests for cgrp_local_storage with local percpu kptr · dfae1eee
      Yonghong Song authored
      Add a non-sleepable cgrp_local_storage test with percpu kptr. The
      test does allocation of percpu data, assigning values to percpu
      data and retrieval of percpu data. The de-allocation of percpu
      data is done when the map is freed.
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230827152827.2001784-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      dfae1eee
    • Yonghong Song's avatar
      selftests/bpf: Remove unnecessary direct read of local percpu kptr · 46200d6d
      Yonghong Song authored
      For the second argument of bpf_kptr_xchg(), if the reg type contains
      MEM_ALLOC and MEM_PERCPU, which means a percpu allocation,
      after bpf_kptr_xchg(), the argument is marked as MEM_RCU and MEM_PERCPU
      if in rcu critical section. This way, re-reading from the map value
      is not needed. Remove it from the percpu_alloc_array.c selftest.
      
      Without previous kernel change, the test will fail like below:
      
        0: R1=ctx(off=0,imm=0) R10=fp0
        ; int BPF_PROG(test_array_map_10, int a)
        0: (b4) w1 = 0                        ; R1_w=0
        ; int i, index = 0;
        1: (63) *(u32 *)(r10 -4) = r1         ; R1_w=0 R10=fp0 fp-8=0000????
        2: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
        ;
        3: (07) r2 += -4                      ; R2_w=fp-4
        ; e = bpf_map_lookup_elem(&array, &index);
        4: (18) r1 = 0xffff88810e771800       ; R1_w=map_ptr(off=0,ks=4,vs=16,imm=0)
        6: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0)
        7: (bf) r6 = r0                       ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0) R6_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0)
        ; if (!e)
        8: (15) if r6 == 0x0 goto pc+81       ; R6_w=map_value(off=0,ks=4,vs=16,imm=0)
        ; bpf_rcu_read_lock();
        9: (85) call bpf_rcu_read_lock#87892          ;
        ; p = e->pc;
        10: (bf) r7 = r6                      ; R6=map_value(off=0,ks=4,vs=16,imm=0) R7_w=map_value(off=0,ks=4,vs=16,imm=0)
        11: (07) r7 += 8                      ; R7_w=map_value(off=8,ks=4,vs=16,imm=0)
        12: (79) r6 = *(u64 *)(r6 +8)         ; R6_w=percpu_rcu_ptr_or_null_val_t(id=2,off=0,imm=0)
        ; if (!p) {
        13: (55) if r6 != 0x0 goto pc+13      ; R6_w=0
        ; p = bpf_percpu_obj_new(struct val_t);
        14: (18) r1 = 0x12                    ; R1_w=18
        16: (b7) r2 = 0                       ; R2_w=0
        17: (85) call bpf_percpu_obj_new_impl#87883   ; R0_w=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4
        18: (bf) r6 = r0                      ; R0=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4
        ; if (!p)
        19: (15) if r6 == 0x0 goto pc+69      ; R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4
        ; p1 = bpf_kptr_xchg(&e->pc, p);
        20: (bf) r1 = r7                      ; R1_w=map_value(off=8,ks=4,vs=16,imm=0) R7=map_value(off=8,ks=4,vs=16,imm=0) refs=4
        21: (bf) r2 = r6                      ; R2_w=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4
        22: (85) call bpf_kptr_xchg#194       ; R0_w=percpu_ptr_or_null_val_t(id=6,ref_obj_id=6,off=0,imm=0) refs=6
        ; if (p1) {
        23: (15) if r0 == 0x0 goto pc+3       ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6
        ; bpf_percpu_obj_drop(p1);
        24: (bf) r1 = r0                      ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) R1_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6
        25: (b7) r2 = 0                       ; R2_w=0 refs=6
        26: (85) call bpf_percpu_obj_drop_impl#87882          ;
        ; v = bpf_this_cpu_ptr(p);
        27: (bf) r1 = r6                      ; R1_w=scalar(id=7) R6=scalar(id=7)
        28: (85) call bpf_this_cpu_ptr#154
        R1 type=scalar expected=percpu_ptr_, percpu_rcu_ptr_, percpu_trusted_ptr_
      
      The R1 which gets its value from R6 is a scalar. But before insn 22, R6 is
        R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0)
      Its type is changed to a scalar at insn 22 without previous patch.
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230827152821.2001129-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      46200d6d