1. 07 Dec, 2022 11 commits
    • Alexei Starovoitov's avatar
      Merge branch 'Refactor verifier prune and jump point handling' · f0c5a2d9
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Disentangle prune and jump points in BPF verifier code. They are conceptually
      independent but currently coupled together. This small patch set refactors
      related code and make it possible to have some instruction marked as pruning
      or jump point independently.
      
      Besides just conceptual cleanliness, this allows to remove unnecessary jump
      points (saving a tiny bit of performance and memory usage, potentially), and
      even more importantly it allows for clean extension of special pruning points,
      similarly to how it's done for BPF_FUNC_timer_set_callback. This will be used
      by future patches implementing open-coded BPF iterators.
      
      v1->v2:
        - clarified path #3 commit message and a comment in the code (John);
        - added back mark_jmp_point() to right after subprog call to record
          non-linear implicit jump from BPF_EXIT to right after CALL <subprog>.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f0c5a2d9
    • Andrii Nakryiko's avatar
      bpf: remove unnecessary prune and jump points · 618945fb
      Andrii Nakryiko authored
      Don't mark some instructions as jump points when there are actually no
      jumps and instructions are just processed sequentially. Such case is
      handled naturally by precision backtracking logic without the need to
      update jump history. See get_prev_insn_idx(). It goes back linearly by
      one instruction, unless current top of jmp_history is pointing to
      current instruction. In such case we use `st->jmp_history[cnt - 1].prev_idx`
      to find instruction from which we jumped to the current instruction
      non-linearly.
      
      Also remove both jump and prune point marking for instruction right
      after unconditional jumps, as program flow can get to the instruction
      right after unconditional jump instruction only if there is a jump to
      that instruction from somewhere else in the program. In such case we'll
      mark such instruction as prune/jump point because it's a destination of
      a jump.
      
      This change has no changes in terms of number of instructions or states
      processes across Cilium and selftests programs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20221206233345.438540-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      618945fb
    • Andrii Nakryiko's avatar
      bpf: mostly decouple jump history management from is_state_visited() · a095f421
      Andrii Nakryiko authored
      Jump history updating and state equivalence checks are conceptually
      independent, so move push_jmp_history() out of is_state_visited(). Also
      make a decision whether to perform state equivalence checks or not one
      layer higher in do_check(), keeping is_state_visited() unconditionally
      performing state checks.
      
      push_jmp_history() should be performed after state checks. There is just
      one small non-uniformity. When is_state_visited() finds already
      validated equivalent state, it propagates precision marks to current
      state's parent chain. For this to work correctly, jump history has to be
      updated, so is_state_visited() is doing that internally.
      
      But if no equivalent verified state is found, jump history has to be
      updated in a newly cloned child state, so is_jmp_point()
      + push_jmp_history() is performed after is_state_visited() exited with
      zero result, which means "proceed with validation".
      
      This change has no functional changes. It's not strictly necessary, but
      feels right to decouple these two processes.
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221206233345.438540-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a095f421
    • Andrii Nakryiko's avatar
      bpf: decouple prune and jump points · bffdeaa8
      Andrii Nakryiko authored
      BPF verifier marks some instructions as prune points. Currently these
      prune points serve two purposes.
      
      It's a point where verifier tries to find previously verified state and
      check current state's equivalence to short circuit verification for
      current code path.
      
      But also currently it's a point where jump history, used for precision
      backtracking, is updated. This is done so that non-linear flow of
      execution could be properly backtracked.
      
      Such coupling is coincidental and unnecessary. Some prune points are not
      part of some non-linear jump path, so don't need update of jump history.
      On the other hand, not all instructions which have to be recorded in
      jump history necessarily are good prune points.
      
      This patch splits prune and jump points into independent flags.
      Currently all prune points are marked as jump points to minimize amount
      of changes in this patch, but next patch will perform some optimization
      of prune vs jmp point placement.
      
      No functional changes are intended.
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221206233345.438540-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bffdeaa8
    • Dave Marchevsky's avatar
      bpf: Loosen alloc obj test in verifier's reg_btf_record · d8939cb0
      Dave Marchevsky authored
      btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
      There, a BTF record is created for any type containing a spin_lock or
      any next-gen datastructure node/head.
      
      Currently, for non-MAP_VALUE types, reg_btf_record will only search for
      a record using struct_meta_tab if the reg->type exactly matches
      (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
      "allocated obj" type - returned from bpf_obj_new - might pick up other
      flags while working its way through the program.
      
      Loosen the check to be exact for base_type and just use MEM_ALLOC mask
      for type_flag.
      
      This patch is marked Fixes as the original intent of reg_btf_record was
      unlikely to have been to fail finding btf_record for valid alloc obj
      types with additional flags, some of which (e.g. PTR_UNTRUSTED)
      are valid register type states for alloc obj independent of this series.
      However, I didn't find a specific broken repro case outside of this
      series' added functionality, so it's possible that nothing was
      triggering this logic error before.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Fixes: 4e814da0 ("bpf: Allow locking bpf_spin_lock in allocated objects")
      Link: https://lore.kernel.org/r/20221206231000.3180914-2-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d8939cb0
    • David Vernet's avatar
      bpf: Don't use rcu_users to refcount in task kfuncs · 156ed20d
      David Vernet authored
      A series of prior patches added some kfuncs that allow struct
      task_struct * objects to be used as kptrs. These kfuncs leveraged the
      'refcount_t rcu_users' field of the task for performing refcounting.
      This field was used instead of 'refcount_t usage', as we wanted to
      leverage the safety provided by RCU for ensuring a task's lifetime.
      
      A struct task_struct is refcounted by two different refcount_t fields:
      
      1. p->usage:     The "true" refcount field which task lifetime. The
      		 task is freed as soon as this refcount drops to 0.
      
      2. p->rcu_users: An "RCU users" refcount field which is statically
      		 initialized to 2, and is co-located in a union with
      		 a struct rcu_head field (p->rcu). p->rcu_users
      		 essentially encapsulates a single p->usage
      		 refcount, and when p->rcu_users goes to 0, an RCU
      		 callback is scheduled on the struct rcu_head which
      		 decrements the p->usage refcount.
      
      Our logic was that by using p->rcu_users, we would be able to use RCU to
      safely issue refcount_inc_not_zero() a task's rcu_users field to
      determine if a task could still be acquired, or was exiting.
      Unfortunately, this does not work due to p->rcu_users and p->rcu sharing
      a union. When p->rcu_users goes to 0, an RCU callback is scheduled to
      drop a single p->usage refcount, and because the fields share a union,
      the refcount immediately becomes nonzero again after the callback is
      scheduled.
      
      If we were to split the fields out of the union, this wouldn't be a
      problem. Doing so should also be rather non-controversial, as there are
      a number of places in struct task_struct that have padding which we
      could use to avoid growing the structure by splitting up the fields.
      
      For now, so as to fix the kfuncs to be correct, this patch instead
      updates bpf_task_acquire() and bpf_task_release() to use the p->usage
      field for refcounting via the get_task_struct() and put_task_struct()
      functions. Because we can no longer rely on RCU, the change also guts
      the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions
      pending a resolution on the above problem.
      
      In addition, the task fixes the kfunc and rcu_read_lock selftests to
      expect this new behavior.
      
      Fixes: 90660309 ("bpf: Add kfuncs for storing struct task_struct * as a kptr")
      Fixes: fca1aa75 ("bpf: Handle MEM_RCU type properly")
      Reported-by: default avatarMatus Jokay <matus.jokay@stuba.sk>
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      156ed20d
    • Andrii Nakryiko's avatar
      Merge branch 'BPF selftests fixes' · 235d2ef2
      Andrii Nakryiko authored
      Daan De Meyer says:
      
      ====================
      
      This patch series fixes a few issues I've found while integrating the
      bpf selftests into systemd's mkosi development environment.
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      235d2ef2
    • Daan De Meyer's avatar
      selftests/bpf: Use CONFIG_TEST_BPF=m instead of CONFIG_TEST_BPF=y · d0c0b48c
      Daan De Meyer authored
      CONFIG_TEST_BPF can only be a module, so let's indicate it as such in
      the selftests config.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221205131618.1524337-4-daan.j.demeyer@gmail.com
      d0c0b48c
    • Daan De Meyer's avatar
      selftests/bpf: Use "is not set" instead of "=n" · efe7fadb
      Daan De Meyer authored
      "=n" is not valid kconfig syntax. Use "is not set" instead to indicate
      the option should be disabled.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221205131618.1524337-3-daan.j.demeyer@gmail.com
      efe7fadb
    • Daan De Meyer's avatar
      selftests/bpf: Install all required files to run selftests · d68ae498
      Daan De Meyer authored
      When installing the selftests using
      "make -C tools/testing/selftests install", we need to make sure
      all the required files to run the selftests are installed. Let's
      make sure this is the case.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221205131618.1524337-2-daan.j.demeyer@gmail.com
      d68ae498
    • Timo Hunziker's avatar
      libbpf: Parse usdt args without offset on x86 (e.g. 8@(%rsp)) · c21dc529
      Timo Hunziker authored
      Parse USDT arguments like "8@(%rsp)" on x86. These are emmited by
      SystemTap. The argument syntax is similar to the existing "memory
      dereference case" but the offset left out as it's zero (i.e. read
      the value from the address in the register). We treat it the same
      as the the "memory dereference case", but set the offset to 0.
      
      I've tested that this fixes the "unrecognized arg #N spec: 8@(%rsp).."
      error I've run into when attaching to a probe with such an argument.
      Attaching and reading the correct argument values works.
      
      Something similar might be needed for the other supported
      architectures.
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/559Signed-off-by: default avatarTimo Hunziker <timo.hunziker@gmx.ch>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221203123746.2160-1-timo.hunziker@eclipso.ch
      c21dc529
  2. 06 Dec, 2022 8 commits
  3. 05 Dec, 2022 6 commits
  4. 04 Dec, 2022 5 commits
  5. 03 Dec, 2022 2 commits
  6. 02 Dec, 2022 3 commits
  7. 01 Dec, 2022 5 commits