1. 11 Oct, 2020 10 commits
  2. 09 Oct, 2020 7 commits
  3. 08 Oct, 2020 5 commits
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf: auto-resize relocatable LOAD/STORE instructions' · 1e9259ec
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      Patch set implements logic in libbpf to auto-adjust memory size (1-, 2-, 4-,
      8-bytes) of load/store (LD/ST/STX) instructions which have BPF CO-RE field
      offset relocation associated with it. In practice this means transparent
      handling of 32-bit kernels, both pointer and unsigned integers. Signed
      integers are not relocatable with zero-extending loads/stores, so libbpf
      poisons them and generates a warning. If/when BPF gets support for
      sign-extending loads/stores, it would be possible to automatically relocate
      them as well.
      
      All the details are contained in patch #2 comments and commit message.
      Patch #3 is a simple change in libbpf to make advanced testing with custom BTF
      easier. Patch #4 validates correct uses of auto-resizable loads, as well as
      check that libbpf fails invalid uses. Patch #1 skips CO-RE relocation for
      programs that had bpf_program__set_autoload(prog, false) set on them, reducing
      warnings and noise.
      
      v2->v3:
        - fix copyright (Alexei);
      v1->v2:
        - more consistent names for instruction mem size convertion routines (Alexei);
        - extended selftests to use relocatable STX instructions (Alexei);
        - added a fix for skipping CO-RE relocation for non-loadable programs.
      
      Cc: Luka Perkov <luka.perkov@sartura.hr>
      Cc: Tony Ambardar <tony.ambardar@gmail.com>
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1e9259ec
    • Andrii Nakryiko's avatar
      selftests/bpf: Validate libbpf's auto-sizing of LD/ST/STX instructions · 888d83b9
      Andrii Nakryiko authored
      Add selftests validating libbpf's auto-resizing of load/store instructions
      when used with CO-RE relocations. An explicit and manual approach with using
      bpf_core_read() is also demonstrated and tested. Separate BPF program is
      supposed to fail due to using signed integers of sizes that differ from
      kernel's sizes.
      
      To reliably simulate 32-bit BTF (i.e., the one with sizeof(long) ==
      sizeof(void *) == 4), selftest generates its own custom BTF and passes it as
      a replacement for real kernel BTF. This allows to test 32/64-bitness mix on
      all architectures.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201008001025.292064-5-andrii@kernel.org
      888d83b9
    • Andrii Nakryiko's avatar
      libbpf: Allow specifying both ELF and raw BTF for CO-RE BTF override · 2b7d88c2
      Andrii Nakryiko authored
      Use generalized BTF parsing logic, making it possible to parse BTF both from
      ELF file, as well as a raw BTF dump. This makes it easier to write custom
      tests with manually generated BTFs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201008001025.292064-4-andrii@kernel.org
      2b7d88c2
    • Andrii Nakryiko's avatar
      libbpf: Support safe subset of load/store instruction resizing with CO-RE · a66345bc
      Andrii Nakryiko authored
      Add support for patching instructions of the following form:
        - rX = *(T *)(rY + <off>);
        - *(T *)(rX + <off>) = rY;
        - *(T *)(rX + <off>) = <imm>, where T is one of {u8, u16, u32, u64}.
      
      For such instructions, if the actual kernel field recorded in CO-RE relocation
      has a different size than the one recorded locally (e.g., from vmlinux.h),
      then libbpf will adjust T to an appropriate 1-, 2-, 4-, or 8-byte loads.
      
      In general, such transformation is not always correct and could lead to
      invalid final value being loaded or stored. But two classes of cases are
      always safe:
        - if both local and target (kernel) types are unsigned integers, but of
        different sizes, then it's OK to adjust load/store instruction according to
        the necessary memory size. Zero-extending nature of such instructions and
        unsignedness make sure that the final value is always correct;
        - pointer size mismatch between BPF target architecture (which is always
        64-bit) and 32-bit host kernel architecture can be similarly resolved
        automatically, because pointer is essentially an unsigned integer. Loading
        32-bit pointer into 64-bit BPF register with zero extension will leave
        correct pointer in the register.
      
      Both cases are necessary to support CO-RE on 32-bit kernels, as `unsigned
      long` in vmlinux.h generated from 32-bit kernel is 32-bit, but when compiled
      with BPF program for BPF target it will be treated by compiler as 64-bit
      integer. Similarly, pointers in vmlinux.h are 32-bit for kernel, but treated
      as 64-bit values by compiler for BPF target. Both problems are now resolved by
      libbpf for direct memory reads.
      
      But similar transformations are useful in general when kernel fields are
      "resized" from, e.g., unsigned int to unsigned long (or vice versa).
      
      Now, similar transformations for signed integers are not safe to perform as
      they will result in incorrect sign extension of the value. If such situation
      is detected, libbpf will emit helpful message and will poison the instruction.
      Not failing immediately means that it's possible to guard the instruction
      based on kernel version (or other conditions) and make sure it's not
      reachable.
      
      If there is a need to read signed integers that change sizes between different
      kernels, it's possible to use BPF_CORE_READ_BITFIELD() macro, which works both
      with bitfields and non-bitfield integers of any signedness and handles
      sign-extension properly. Also, bpf_core_read() with proper size and/or use of
      bpf_core_field_size() relocation could allow to deal with such complicated
      situations explicitly, if not so conventiently as direct memory reads.
      
      Selftests added in a separate patch in progs/test_core_autosize.c demonstrate
      both direct memory and probed use cases.
      
      BPF_CORE_READ() is not changed and it won't deal with such situations as
      automatically as direct memory reads due to the signedness integer
      limitations, which are much harder to detect and control with compiler macro
      magic. So it's encouraged to utilize direct memory reads as much as possible.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201008001025.292064-3-andrii@kernel.org
      a66345bc
    • Andrii Nakryiko's avatar
      libbpf: Skip CO-RE relocations for not loaded BPF programs · 47f7cf63
      Andrii Nakryiko authored
      Bypass CO-RE relocations step for BPF programs that are not going to be
      loaded. This allows to have BPF programs compiled in and disabled dynamically
      if kernel is not supposed to provide enough relocation information. In such
      case, there won't be unnecessary warnings about failed relocations.
      
      Fixes: d9297581 ("libbpf: Support disabling auto-loading BPF programs")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201008001025.292064-2-andrii@kernel.org
      47f7cf63
  4. 07 Oct, 2020 5 commits
  5. 06 Oct, 2020 10 commits
  6. 05 Oct, 2020 3 commits
    • Andrii Nakryiko's avatar
      bpf, doc: Update Andrii's email in MAINTAINERS · dca4121c
      Andrii Nakryiko authored
      Update Andrii Nakryiko's reviewer email to kernel.org account. This optimizes
      email logistics on my side and makes it less likely for me to miss important
      patches.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20201005223648.2437130-1-andrii@kernel.org
      dca4121c
    • Song Liu's avatar
      bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI · 39d8f0d1
      Song Liu authored
      Recent improvements in LOCKDEP highlighted a potential A-A deadlock with
      pcpu_freelist in NMI:
      
      ./tools/testing/selftests/bpf/test_progs -t stacktrace_build_id_nmi
      
      [   18.984807] ================================
      [   18.984807] WARNING: inconsistent lock state
      [   18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted
      [   18.984809] --------------------------------
      [   18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage.
      [   18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes:
      [   18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at: __pcpu_freelist_pop+0xe3/0x180
      [   18.984813] {INITIAL USE} state was registered at:
      [   18.984814]   lock_acquire+0x175/0x7c0
      [   18.984814]   _raw_spin_lock+0x2c/0x40
      [   18.984815]   __pcpu_freelist_pop+0xe3/0x180
      [   18.984815]   pcpu_freelist_pop+0x31/0x40
      [   18.984816]   htab_map_alloc+0xbbf/0xf40
      [   18.984816]   __do_sys_bpf+0x5aa/0x3ed0
      [   18.984817]   do_syscall_64+0x2d/0x40
      [   18.984818]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   18.984818] irq event stamp: 12
      [...]
      [   18.984822] other info that might help us debug this:
      [   18.984823]  Possible unsafe locking scenario:
      [   18.984823]
      [   18.984824]        CPU0
      [   18.984824]        ----
      [   18.984824]   lock(&head->lock);
      [   18.984826]   <Interrupt>
      [   18.984826]     lock(&head->lock);
      [   18.984827]
      [   18.984828]  *** DEADLOCK ***
      [   18.984828]
      [   18.984829] 2 locks held by test_progs/1990:
      [...]
      [   18.984838]  <NMI>
      [   18.984838]  dump_stack+0x9a/0xd0
      [   18.984839]  lock_acquire+0x5c9/0x7c0
      [   18.984839]  ? lock_release+0x6f0/0x6f0
      [   18.984840]  ? __pcpu_freelist_pop+0xe3/0x180
      [   18.984840]  _raw_spin_lock+0x2c/0x40
      [   18.984841]  ? __pcpu_freelist_pop+0xe3/0x180
      [   18.984841]  __pcpu_freelist_pop+0xe3/0x180
      [   18.984842]  pcpu_freelist_pop+0x17/0x40
      [   18.984842]  ? lock_release+0x6f0/0x6f0
      [   18.984843]  __bpf_get_stackid+0x534/0xaf0
      [   18.984843]  bpf_prog_1fd9e30e1438d3c5_oncpu+0x73/0x350
      [   18.984844]  bpf_overflow_handler+0x12f/0x3f0
      
      This is because pcpu_freelist_head.lock is accessed in both NMI and
      non-NMI context. Fix this issue by using raw_spin_trylock() in NMI.
      
      Since NMI interrupts non-NMI context, when NMI context tries to lock the
      raw_spinlock, non-NMI context of the same CPU may already have locked a
      lock and is blocked from unlocking the lock. For a system with N CPUs,
      there could be N NMIs at the same time, and they may block N non-NMI
      raw_spinlocks. This is tricky for pcpu_freelist_push(), where unlike
      _pop(), failing _push() means leaking memory. This issue is more likely to
      trigger in non-SMP system.
      
      Fix this issue with an extra list, pcpu_freelist.extralist. The extralist
      is primarily used to take _push() when raw_spin_trylock() failed on all
      the per CPU lists. It should be empty most of the time. The following
      table summarizes the behavior of pcpu_freelist in NMI and non-NMI:
      
      non-NMI pop(): 	use _lock(); check per CPU lists first;
                      if all per CPU lists are empty, check extralist;
                      if extralist is empty, return NULL.
      
      non-NMI push(): use _lock(); only push to per CPU lists.
      
      NMI pop():    use _trylock(); check per CPU lists first;
                    if all per CPU lists are locked or empty, check extralist;
                    if extralist is locked or empty, return NULL.
      
      NMI push():   use _trylock(); check per CPU lists first;
                    if all per CPU lists are locked; try push to extralist;
                    if extralist is also locked, keep trying on per CPU lists.
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20201005165838.3735218-1-songliubraving@fb.com
      39d8f0d1
    • Gustavo A. R. Silva's avatar