1. 24 Aug, 2021 1 commit
    • Dave Marchevsky's avatar
      bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum · 6fc88c35
      Dave Marchevsky authored
      Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf
      attach types and a function to map bpf_attach_type values to the new
      enum. Inspired by netns_bpf_attach_type.
      
      Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever
      possible.  Functionality is unchanged as attach_type_to_prog_type
      switches in bpf/syscall.c were preventing non-cgroup programs from
      making use of the invalid cgroup_bpf array slots.
      
      As a result struct cgroup_bpf uses 504 fewer bytes relative to when its
      arrays were sized using MAX_BPF_ATTACH_TYPE.
      
      bpf_cgroup_storage is notably not migrated as struct
      bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type
      member which is not meant to be opaque. Similarly, bpf_cgroup_link
      continues to report its bpf_attach_type member to userspace via fdinfo
      and bpf_link_info.
      
      To ease disambiguation, bpf_attach_type variables are renamed from
      'type' to 'atype' when changed to cgroup_bpf_attach_type.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com
      6fc88c35
  2. 23 Aug, 2021 1 commit
    • Jiang Wang's avatar
      af_unix: Fix NULL pointer bug in unix_shutdown · d359902d
      Jiang Wang authored
      Commit 94531cfc ("af_unix: Add unix_stream_proto for sockmap")
      introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the
      unhash function will call prot->unhash(), which is NULL for SEQPACKET.
      And kernel will panic. On ARM32, it will show following messages: (it
      likely affects x86 too).
      
      Fix the bug by checking the prot->unhash is NULL or not first.
      
      Kernel log:
      <--- cut here ---
       Unable to handle kernel NULL pointer dereference at virtual address
      00000000
       pgd = 2fba1ffb
       *pgd=00000000
       Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2
       Modules linked in:
       CPU: 1 PID: 1999 Comm: falkon Tainted: G        W
      5.14.0-rc5-01175-g94531cfc-dirty #9240
       Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
       PC is at 0x0
       LR is at unix_shutdown+0x81/0x1a8
       pc : [<00000000>]    lr : [<c08f3311>]    psr: 600f0013
       sp : e45aff70  ip : e463a3c0  fp : beb54f04
       r10: 00000125  r9 : e45ae000  r8 : c4a56664
       r7 : 00000001  r6 : c4a56464  r5 : 00000001  r4 : c4a56400
       r3 : 00000000  r2 : c5a6b180  r1 : 00000000  r0 : c4a56400
       Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
       Control: 50c5387d  Table: 05aa804a  DAC: 00000051
       Register r0 information: slab PING start c4a56400 pointer offset 0
       Register r1 information: NULL pointer
       Register r2 information: slab task_struct start c5a6b180 pointer offset 0
       Register r3 information: NULL pointer
       Register r4 information: slab PING start c4a56400 pointer offset 0
       Register r5 information: non-paged memory
       Register r6 information: slab PING start c4a56400 pointer offset 100
       Register r7 information: non-paged memory
       Register r8 information: slab PING start c4a56400 pointer offset 612
       Register r9 information: non-slab/vmalloc memory
       Register r10 information: non-paged memory
       Register r11 information: non-paged memory
       Register r12 information: slab filp start e463a3c0 pointer offset 0
       Process falkon (pid: 1999, stack limit = 0x9ec48895)
       Stack: (0xe45aff70 to 0xe45b0000)
       ff60:                                     e45ae000 c5f26a00 00000000 00000125
       ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4
       ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc
       ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04
       ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000
       [<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50)
       [<c07f7fa3>] (__sys_shutdown) from [<c010024b>]
      (__sys_trace_return+0x1/0x16)
       Exception stack(0xe45affa8 to 0xe45afff0)
      
      Fixes: 94531cfc ("af_unix: Add unix_stream_proto for sockmap")
      Reported-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Signed-off-by: default avatarJiang Wang <jiang.wang@bytedance.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com
      d359902d
  3. 19 Aug, 2021 7 commits
    • Prankur Gupta's avatar
      selftests/bpf: Add tests for {set|get} socket option from setsockopt BPF · f2a6ee92
      Prankur Gupta authored
      Adding selftests for the newly added functionality to call bpf_setsockopt()
      and bpf_getsockopt() from setsockopt BPF programs.
      
      Test Details:
      
      1. BPF Program
      
         Checks for changes in IPV6_TCLASS(SOL_IPV6) via setsockopt
         If the cca for the socket is not cubic do nothing
         If the newly set value for IPV6_TCLASS is 45 (0x2d) (as per our use-case)
         then change the cc from cubic to reno
      
      2. User Space Program
      
         Creates an AF_INET6 socket and set the cca for that to be "cubic"
         Attach the program and set the IPV6_TCLASS to 0x2d using setsockopt
         Verify the cca for the socket changed to reno
      Signed-off-by: default avatarPrankur Gupta <prankgup@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210817224221.3257826-3-prankgup@fb.com
      f2a6ee92
    • Prankur Gupta's avatar
      bpf: Add support for {set|get} socket options from setsockopt BPF · 2c531639
      Prankur Gupta authored
      Add logic to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF
      programs. An example use case is when the user sets the IPV6_TCLASS socket
      option, we would also like to change the tcp-cc for that socket.
      
      We don't have any use case for calling bpf_setsockopt() from supposedly read-
      only sys_getsockopt(), so it is made available to BPF_CGROUP_SETSOCKOPT only
      at this point.
      Signed-off-by: default avatarPrankur Gupta <prankgup@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210817224221.3257826-2-prankgup@fb.com
      2c531639
    • Stanislav Fomichev's avatar
      bpf: Use kvmalloc for map keys in syscalls · 44779a4b
      Stanislav Fomichev authored
      Same as previous patch but for the keys. memdup_bpfptr is renamed
      to kvmemdup_bpfptr (and converted to kvmalloc).
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210818235216.1159202-2-sdf@google.com
      44779a4b
    • Stanislav Fomichev's avatar
      bpf: Use kvmalloc for map values in syscall · f0dce1d9
      Stanislav Fomichev authored
      Use kvmalloc/kvfree for temporary value when manipulating a map via
      syscall. kmalloc might not be sufficient for percpu maps where the value
      is big (and further multiplied by hundreds of CPUs).
      
      Can be reproduced with netcnt test on qemu with "-smp 255".
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210818235216.1159202-1-sdf@google.com
      f0dce1d9
    • Yucong Sun's avatar
      selftests/bpf: Adding delay in socketmap_listen to reduce flakyness · 3666b167
      Yucong Sun authored
      This patch adds a 1ms delay to reduce flakyness of the test.
      Signed-off-by: default avatarYucong Sun <fallentree@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210819163609.2583758-1-fallentree@fb.com
      3666b167
    • Yonghong Song's avatar
      bpf: Fix NULL event->prog pointer access in bpf_overflow_handler · 594286b7
      Yonghong Song authored
      Andrii reported that libbpf CI hit the following oops when
      running selftest send_signal:
        [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030
        [ 1243.161066] #PF: supervisor read access in kernel mode
        [ 1243.161066] #PF: error_code(0x0000) - not-present page
        [ 1243.161066] PGD 0 P4D 0
        [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI
        [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G           O      5.14.0-rc5 #1
        [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
        [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0
        [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f
        [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046
        [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8
        [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80
        [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000
        [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000
        [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082
        [ 1243.161066] FS:  00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
        [ 1243.161066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0
        [ 1243.161066] Call Trace:
        [ 1243.161066]  <IRQ>
        [ 1243.161066]  __perf_event_overflow+0x4f/0xf0
        [ 1243.161066]  perf_swevent_hrtimer+0x116/0x130
        [ 1243.161066]  ? __lock_acquire+0x378/0x2730
        [ 1243.161066]  ? __lock_acquire+0x372/0x2730
        [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
        [ 1243.161066]  ? find_held_lock+0x2b/0x80
        [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
        [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
        [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
        [ 1243.161066]  __hrtimer_run_queues+0x1a3/0x460
        [ 1243.161066]  hrtimer_interrupt+0x110/0x220
        [ 1243.161066]  __sysvec_apic_timer_interrupt+0x8a/0x260
        [ 1243.161066]  sysvec_apic_timer_interrupt+0x89/0xc0
        [ 1243.161066]  </IRQ>
        [ 1243.161066]  asm_sysvec_apic_timer_interrupt+0x12/0x20
        [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250
        [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0
        [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282
        [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000
        [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e
        [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001
        [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80
        [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000
        [ 1243.161066]  ? finish_task_switch+0xab/0x250
        [ 1243.161066]  ? finish_task_switch+0x70/0x250
        [ 1243.161066]  __schedule+0x36b/0xbb0
        [ 1243.161066]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
        [ 1243.161066]  ? lockdep_hardirqs_on+0x79/0x100
        [ 1243.161066]  schedule+0x43/0xe0
        [ 1243.161066]  pipe_read+0x30b/0x450
        [ 1243.161066]  ? wait_woken+0x80/0x80
        [ 1243.161066]  new_sync_read+0x164/0x170
        [ 1243.161066]  vfs_read+0x122/0x1b0
        [ 1243.161066]  ksys_read+0x93/0xd0
        [ 1243.161066]  do_syscall_64+0x35/0x80
        [ 1243.161066]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The oops can also be reproduced with the following steps:
        ./vmtest.sh -s
        # at qemu shell
        cd /root/bpf && while true; do ./test_progs -t send_signal
      
      Further analysis showed that the failure is introduced with
      commit b89fbfbb ("bpf: Implement minimal BPF perf link").
      With the above commit, the following scenario becomes possible:
          cpu1                        cpu2
                                      hrtimer_interrupt -> bpf_overflow_handler
          (due to closing link_fd)
          bpf_perf_link_release ->
          perf_event_free_bpf_prog ->
          perf_event_free_bpf_handler ->
            WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler)
            event->prog = NULL
                                      bpf_prog_run(event->prog, &ctx)
      
      In the above case, the event->prog is NULL for bpf_prog_run, hence
      causing oops.
      
      To fix the issue, check whether event->prog is NULL or not. If it
      is, do not call bpf_prog_run. This seems working as the above
      reproducible step runs more than one hour and I didn't see any
      failures.
      
      Fixes: b89fbfbb ("bpf: Implement minimal BPF perf link")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210819155209.1927994-1-yhs@fb.com
      594286b7
    • Daniel Borkmann's avatar
      bpf: Undo off-by-one in interpreter tail call count limit · f9dabe01
      Daniel Borkmann authored
      The BPF interpreter as well as x86-64 BPF JIT were both in line by allowing
      up to 33 tail calls (however odd that number may be!). Recently, this was
      changed for the interpreter to reduce it down to 32 with the assumption that
      this should have been the actual limit "which is in line with the behavior of
      the x86 JITs" according to b61a28cf ("bpf: Fix off-by-one in tail call
      count limiting").
      
      Paul recently reported:
      
        I'm a bit surprised by this because I had previously tested the tail call
        limit of several JIT compilers and found it to be 33 (i.e., allowing chains
        of up to 34 programs). I've just extended a test program I had to validate
        this again on the x86-64 JIT, and found a limit of 33 tail calls again [1].
      
        Also note we had previously changed the RISC-V and MIPS JITs to allow up to
        33 tail calls [2, 3], for consistency with other JITs and with the interpreter.
        We had decided to increase these two to 33 rather than decrease the other
        JITs to 32 for backward compatibility, though that probably doesn't matter
        much as I'd expect few people to actually use 33 tail calls.
      
        [1] https://github.com/pchaigno/tail-call-bench/commit/ae7887482985b4b1745c9b2ef7ff9ae506c82886
        [2] 96bc4432 ("bpf, riscv: Limit to 33 tail calls")
        [3] e49e6f6d ("bpf, mips: Limit to 33 tail calls")
      
      Therefore, revert b61a28cf to re-align interpreter to limit a maximum of
      33 tail calls. While it is unlikely to hit the limit for the vast majority,
      programs in the wild could one way or another depend on this, so lets rather
      be a bit more conservative, and lets align the small remainder of JITs to 33.
      If needed in future, this limit could be slightly increased, but not decreased.
      
      Fixes: b61a28cf ("bpf: Fix off-by-one in tail call count limiting")
      Reported-by: default avatarPaul Chaignon <paul@cilium.io>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/CAO5pjwTWrC0_dzTbTHFPSqDwA56aVH+4KFGVqdq8=ASs0MqZGQ@mail.gmail.com
      f9dabe01
  4. 18 Aug, 2021 3 commits
  5. 17 Aug, 2021 19 commits
  6. 16 Aug, 2021 9 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-perf-link' · 3a4ce01b
      Daniel Borkmann authored
      Andrii Nakryiko says:
      
      ====================
      This patch set implements an ability for users to specify custom black box u64
      value for each BPF program attachment, bpf_cookie, which is available to BPF
      program at runtime. This is a feature that's critically missing for cases when
      some sort of generic processing needs to be done by the common BPF program
      logic (or even exactly the same BPF program) across multiple BPF hooks (e.g.,
      many uniformly handled kprobes) and it's important to be able to distinguish
      between each BPF hook at runtime (e.g., for additional configuration lookup).
      
      The choice of restricting this to a fixed-size 8-byte u64 value is an explicit
      design decision. Making this configurable by users adds unnecessary complexity
      (extra memory allocations, extra complications on the verifier side to validate
      accesses to variable-sized data area) while not really opening up new
      possibilities. If user's use case requires storing more data per attachment,
      it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where
      bpf_cookie would be used as an index into respective storage, populated by
      user-space code before creating BPF link. This gives user all the flexibility
      and control while keeping BPF verifier and BPF helper API simple.
      
      Currently, similar functionality can only be achieved through:
      
        - code-generation and BPF program cloning, which is very complicated and
          unmaintainable;
        - on-the-fly C code generation and further runtime compilation, which is
          what BCC uses and allows to do pretty simply. The big downside is a very
          heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
          many BPF program clones and the compilation process itself);
        - in some cases (kprobes and sometimes uprobes) it's possible to do function
          IP lookup to get function-specific configuration. This doesn't work for
          all the cases (e.g., when attaching uprobes to shared libraries) and has
          higher runtime overhead and additional programming complexity due to
          BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
          BPF helper was added, it was also very complicated and unstable (API-wise)
          to get traced function's IP from fentry/fexit and kretprobe.
      
      With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
      to build generic tracing tooling simply and efficiently, ability to provide
      additional bpf_cookie value for each *attachment* (as opposed to each BPF
      program) is extremely important. Two immediate users of this functionality are
      going to be libbpf-based USDT library (currently in development) and retsnoop
      ([0]), but I'm sure more applications will come once users get this feature in
      their kernels.
      
      To achieve above described, all perf_event-based BPF hooks are made available
      through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
      LINK_CREATE command for program attachments and generally brings
      perf_event-based attachments into a common BPF link infrastructure.
      
      With that, LINK_CREATE gets ability to pass throught bpf_cookie value during
      link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF
      helper is added to allow fetching this value at runtime from BPF program side.
      BPF cookie is stored either on struct perf_event itself and fetched from the
      BPF program context, or is passed through ambient BPF run context, added in
      c7603cfa ("bpf: Add ambient BPF runtime context stored in current").
      
      On the libbpf side of things, BPF perf link is utilized whenever is supported
      by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
      All the tracing attach APIs are extended with OPTS and bpf_cookie is passed
      through corresponding opts structs.
      
      Last part of the patch set adds few self-tests utilizing new APIs.
      
      There are also a few refactorings along the way to make things cleaner and
      easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
      throughout libbpf and selftests.
      
      Follow-up patches will extend bpf_cookie to fentry/fexit programs.
      
      While adding uprobe_opts, also extend it with ref_ctr_offset for specifying
      USDT semaphore (reference counter) offset. Update attach_probe selftests to
      validate its functionality. This is another feature (along with bpf_cookie)
      required for implementing libbpf-based USDT solution.
      
        [0] https://github.com/anakryiko/retsnoop
      
      v4->v5:
        - rebase on latest bpf-next to resolve merge conflict;
        - add ref_ctr_offset to uprobe_opts and corresponding selftest;
      v3->v4:
        - get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel);
        - move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel);
      v2->v3:
        - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter);
        - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri);
        - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong);
      v1->v2:
        - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3a4ce01b
    • Andrii Nakryiko's avatar
      selftests/bpf: Add ref_ctr_offset selftests · 4bd11e08
      Andrii Nakryiko authored
      Extend attach_probe selftests to specify ref_ctr_offset for uprobe/uretprobe
      and validate that its value is incremented from zero.
      
      Turns out that once uprobe is attached with ref_ctr_offset, uretprobe for the
      same location/function *has* to use ref_ctr_offset as well, otherwise
      perf_event_open() fails with -EINVAL. So this test uses ref_ctr_offset for
      both uprobe and uretprobe, even though for the purpose of test uprobe would be
      enough.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-17-andrii@kernel.org
      4bd11e08
    • Andrii Nakryiko's avatar
      libbpf: Add uprobe ref counter offset support for USDT semaphores · 5e3b8356
      Andrii Nakryiko authored
      When attaching to uprobes through perf subsystem, it's possible to specify
      offset of a so-called USDT semaphore, which is just a reference counted u16,
      used by kernel to keep track of how many tracers are attached to a given
      location. Support for this feature was added in [0], so just wire this through
      uprobe_opts. This is important to enable implementing USDT attachment and
      tracing through libbpf's bpf_program__attach_uprobe_opts() API.
      
        [0] a6ca88b2 ("trace_uprobe: support reference counter in fd-based uprobe")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org
      5e3b8356
    • Andrii Nakryiko's avatar
      selftests/bpf: Add bpf_cookie selftests for high-level APIs · 0a80cf67
      Andrii Nakryiko authored
      Add selftest with few subtests testing proper bpf_cookie usage.
      
      Kprobe and uprobe subtests are pretty straightforward and just validate that
      the same BPF program attached with different bpf_cookie will be triggered with
      those different bpf_cookie values.
      
      Tracepoint subtest is a bit more interesting, as it is the only
      perf_event-based BPF hook that shares bpf_prog_array between multiple
      perf_events internally. This means that the same BPF program can't be attached
      to the same tracepoint multiple times. So we have 3 identical copies. This
      arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array
      list manipulation logic when programs are attached and detached.  The test
      validates that bpf_cookie isn't mixed up and isn't lost during such list
      manipulations.
      
      Perf_event subtest validates that two BPF links can be created against the
      same perf_event (but not at the same time, only one BPF program can be
      attached to perf_event itself), and that for each we can specify different
      bpf_cookie value.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-15-andrii@kernel.org
      0a80cf67
    • Andrii Nakryiko's avatar
      selftests/bpf: Extract uprobe-related helpers into trace_helpers.{c,h} · a549aaa6
      Andrii Nakryiko authored
      Extract two helpers used for working with uprobes into trace_helpers.{c,h} to
      be re-used between multiple uprobe-using selftests. Also rename get_offset()
      into more appropriate get_uprobe_offset().
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-14-andrii@kernel.org
      a549aaa6
    • Andrii Nakryiko's avatar
      selftests/bpf: Test low-level perf BPF link API · f36d3557
      Andrii Nakryiko authored
      Add tests utilizing low-level bpf_link_create() API to create perf BPF link.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-13-andrii@kernel.org
      f36d3557
    • Andrii Nakryiko's avatar
      libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs · 47faff37
      Andrii Nakryiko authored
      Wire through bpf_cookie for all attach APIs that use perf_event_open under the
      hood:
        - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
        - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
          pass bpf_cookie through opts.
      
      For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
      bpf_cookie is not supported either, return error and log warning for user.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org
      47faff37
    • Andrii Nakryiko's avatar
      libbpf: Add bpf_cookie support to bpf_link_create() API · 3ec84f4b
      Andrii Nakryiko authored
      Add ability to specify bpf_cookie value when creating BPF perf link with
      bpf_link_create() low-level API.
      
      Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
      specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
      and corresponding OPTS struct to accomodate such changes. Add extra checks to
      prevent using incompatible/unexpected combinations of fields.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-11-andrii@kernel.org
      3ec84f4b
    • Andrii Nakryiko's avatar
      libbpf: Use BPF perf link when supported by kernel · 668ace0e
      Andrii Nakryiko authored
      Detect kernel support for BPF perf link and prefer it when attaching to
      perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
      open until BPF link is destroyed, at which point both perf_event FD and BPF
      link FD will be closed.
      
      This preserves current behavior in which perf_event FD is open for the
      duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
      underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
      doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
      will keep both perf_event and bpf_link FDs open, so it will be up to
      (advanced) user to close them. This approach is demonstrated in bpf_cookie.c
      selftests, added in this patch set.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-10-andrii@kernel.org
      668ace0e