1. 09 Dec, 2022 7 commits
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Rework check_func_arg_reg_off · 184c9bdb
      Kumar Kartikeya Dwivedi authored
      While check_func_arg_reg_off is the place which performs generic checks
      needed by various candidates of reg->type, there is some handling for
      special cases, like ARG_PTR_TO_DYNPTR, OBJ_RELEASE, and
      ARG_PTR_TO_RINGBUF_MEM.
      
      This commit aims to streamline these special cases and instead leave
      other things up to argument type specific code to handle. The function
      will be restrictive by default, and cover all possible cases when
      OBJ_RELEASE is set, without having to update the function again (and
      missing to do that being a bug).
      
      This is done primarily for two reasons: associating back reg->type to
      its argument leaves room for the list getting out of sync when a new
      reg->type is supported by an arg_type.
      
      The other case is ARG_PTR_TO_RINGBUF_MEM. The problem there is something
      we already handle, whenever a release argument is expected, it should
      be passed as the pointer that was received from the acquire function.
      Hence zero fixed and variable offset.
      
      There is nothing special about ARG_PTR_TO_RINGBUF_MEM, where technically
      its target register type PTR_TO_MEM | MEM_RINGBUF can already be passed
      with non-zero offset to other helper functions, which makes sense.
      
      Hence, lift the arg_type_is_release check for reg->off and cover all
      possible register types, instead of duplicating the same kind of check
      twice for current OBJ_RELEASE arg_types (alloc_mem and ptr_to_btf_id).
      
      For the release argument, arg_type_is_dynptr is the special case, where
      we go to actual object being freed through the dynptr, so the offset of
      the pointer still needs to allow fixed and variable offset and
      process_dynptr_func will verify them later for the release argument case
      as well.
      
      This is not specific to ARG_PTR_TO_DYNPTR though, we will need to make
      this exception for any future object on the stack that needs to be
      released. In this sense, PTR_TO_STACK as a candidate for object on stack
      argument is a special case for release offset checks, and they need to
      be done by the helper releasing the object on stack.
      
      Since the check has been lifted above all register type checks, remove
      the duplicated check that is being done for PTR_TO_BTF_ID.
      Acked-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221207204141.308952-5-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      184c9bdb
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Rework process_dynptr_func · 27060531
      Kumar Kartikeya Dwivedi authored
      Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
      for use in callback state, because in case of user ringbuf helpers,
      there is no dynptr on the stack that is passed into the callback. To
      reflect such a state, a special register type was created.
      
      However, some checks have been bypassed incorrectly during the addition
      of this feature. First, for arg_type with MEM_UNINIT flag which
      initialize a dynptr, they must be rejected for such register type.
      Secondly, in the future, there are plans to add dynptr helpers that
      operate on the dynptr itself and may change its offset and other
      properties.
      
      In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
      to such helpers, however the current code simply returns 0.
      
      The rejection for helpers that release the dynptr is already handled.
      
      For fixing this, we take a step back and rework existing code in a way
      that will allow fitting in all classes of helpers and have a coherent
      model for dealing with the variety of use cases in which dynptr is used.
      
      First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
      with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
      
      Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
      fact. To make the distinction clear, use MEM_RDONLY flag to indicate
      that the helper only operates on the memory pointed to by the dynptr,
      not the dynptr itself. In C parlance, it would be equivalent to taking
      the dynptr as a point to const argument.
      
      When either of these flags are not present, the helper is allowed to
      mutate both the dynptr itself and also the memory it points to.
      Currently, the read only status of the memory is not tracked in the
      dynptr, but it would be trivial to add this support inside dynptr state
      of the register.
      
      With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
      better reflect its usage, it can no longer be passed to helpers that
      initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
      
      A note to reviewers is that in code that does mark_stack_slots_dynptr,
      and unmark_stack_slots_dynptr, we implicitly rely on the fact that
      PTR_TO_STACK reg is the only case that can reach that code path, as one
      cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
      both cases such helpers won't be setting that flag.
      
      The next patch will add a couple of selftest cases to make sure this
      doesn't break.
      
      Fixes: 20571567 ("bpf: Add bpf_user_ringbuf_drain() helper")
      Acked-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      27060531
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Propagate errors from process_* checks in check_func_arg · ac50fe51
      Kumar Kartikeya Dwivedi authored
      Currently, we simply ignore the errors in process_spin_lock,
      process_timer_func, process_kptr_func, process_dynptr_func. Instead,
      bubble up the error by storing and checking err variable.
      Acked-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221207204141.308952-3-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ac50fe51
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func · 6b75bd3d
      Kumar Kartikeya Dwivedi authored
      ARG_PTR_TO_DYNPTR is akin to ARG_PTR_TO_TIMER, ARG_PTR_TO_KPTR, where
      the underlying register type is subjected to more special checks to
      determine the type of object represented by the pointer and its state
      consistency.
      
      Move dynptr checks to their own 'process_dynptr_func' function so that
      is consistent and in-line with existing code. This also makes it easier
      to reuse this code for kfunc handling.
      
      Then, reuse this consolidated function in kfunc dynptr handling too.
      Note that for kfuncs, the arg_type constraint of DYNPTR_TYPE_LOCAL has
      been lifted.
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Acked-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221207204141.308952-2-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6b75bd3d
    • Alexei Starovoitov's avatar
      Merge branch 'Misc optimizations for bpf mem allocator' · 6798152b
      Alexei Starovoitov authored
      Hou Tao says:
      
      ====================
      
      From: Hou Tao <houtao1@huawei.com>
      
      Hi,
      
      The patchset is just misc optimizations for bpf mem allocator. Patch 1
      fixes the OOM problem found during running hash-table update benchmark
      from qp-trie patchset [0]. The benchmark will add htab elements in
      batch and then delete elements in batch, so freed objects will stack on
      free_by_rcu and wait for the expiration of RCU grace period. There can
      be tens of thousands of freed objects and these objects are not
      available for new allocation, so adding htab element will continue to do
      new allocation.
      
      For the benchmark commmand: "./bench -w3 -d10 -a htab-update -p 16",
      even the maximum entries of htab is 16384, key_size is 255 and
      value_size is 4, the peak memory usage will reach 14GB or more.
      Increasing rcupdate.rcu_task_enqueue_lim will decrease the peak memory to
      860MB, but it is still too many. Although the above case is contrived,
      it is better to fix it and the fixing is simple: just reusing the freed
      objects in free_by_rcu during allocation. After the fix, the peak memory
      usage will decrease to 26MB. Beside above case, the memory blow-up
      problem is also possible when allocation and freeing are done on total
      different CPUs. I'm trying to fix the blow-up problem by using a global
      per-cpu work to free these objects in free_by_rcu timely, but it doesn't
      work very well and I am still digging into it.
      
      Patch 2 is a left-over patch from rcu_trace_implies_rcu_gp() patchset
      [1]. After disscussing with Paul [2], I think it is also safe to skip
      rcu_barrier() when rcu_trace_implies_rcu_gp() returns true.
      
      Comments are always welcome.
      
      Change Log:
      v2:
        * Patch 1: repharse the commit message (Suggested by Yonghong & Alexei)
        * Add Acked-by for both patch 1 and 2
      
      v1: https://lore.kernel.org/bpf/20221206042946.686847-1-houtao@huaweicloud.com
      
      [0]: https://lore.kernel.org/bpf/20220924133620.4147153-13-houtao@huaweicloud.com/
      [1]: https://lore.kernel.org/bpf/20221014113946.965131-1-houtao@huaweicloud.com/
      [2]: https://lore.kernel.org/bpf/20221021185002.GP5600@paulmck-ThinkPad-P17-Gen-1/
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6798152b
    • Hou Tao's avatar
      bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true · 822ed78f
      Hou Tao authored
      If there are pending rcu callback, free_mem_alloc() will use
      rcu_barrier_tasks_trace() and rcu_barrier() to wait for the pending
      __free_rcu_tasks_trace() and __free_rcu() callback.
      
      If rcu_trace_implies_rcu_gp() is true, there will be no pending
      __free_rcu(), so it will be OK to skip rcu_barrier() as well.
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20221209010947.3130477-3-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      822ed78f
    • Hou Tao's avatar
      bpf: Reuse freed element in free_by_rcu during allocation · 0893d600
      Hou Tao authored
      When there are batched freeing operations on a specific CPU, part of
      the freed elements ((high_watermark - lower_watermark) / 2 + 1) will be
      indirectly moved into waiting_for_gp list through free_by_rcu list.
      After call_rcu_in_progress becomes false again, the remaining elements
      in free_by_rcu list will be moved to waiting_for_gp list by the next
      invocation of free_bulk(). However if the expiration of RCU tasks trace
      grace period is relatively slow, none element in free_by_rcu list will
      be moved.
      
      So instead of invoking __alloc_percpu_gfp() or kmalloc_node() to
      allocate a new object, in alloc_bulk() just check whether or not there is
      freed element in free_by_rcu list and reuse it if available.
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20221209010947.3130477-2-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0893d600
  2. 08 Dec, 2022 8 commits
    • Stanislav Fomichev's avatar
      selftests/bpf: Bring test_offload.py back to life · e60db051
      Stanislav Fomichev authored
      Bpftool has new extra libbpf_det_bind probing map we need to exclude.
      Also skip trying to load netdevsim modules if it's already loaded (builtin).
      
      v2:
      - drop iproute2->bpftool changes (Toke)
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221206232739.2504890-1-sdf@google.com
      e60db051
    • Yang Jihong's avatar
      bpf: Fix comment error in fixup_kfunc_call function · c2cc0ce7
      Yang Jihong authored
      insn->imm for kfunc is the relative address of __bpf_call_base,
      instead of __bpf_base_call, Fix the comment error.
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20221208013724.257848-1-yangjihong1@huawei.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c2cc0ce7
    • Björn Töpel's avatar
      bpf: Do not zero-extend kfunc return values · d35af0a7
      Björn Töpel authored
      In BPF all global functions, and BPF helpers return a 64-bit
      value. For kfunc calls, this is not the case, and they can return
      e.g. 32-bit values.
      
      The return register R0 for kfuncs calls can therefore be marked as
      subreg_def != DEF_NOT_SUBREG. In general, if a register is marked with
      subreg_def != DEF_NOT_SUBREG, some archs (where bpf_jit_needs_zext()
      returns true) require the verifier to insert explicit zero-extension
      instructions.
      
      For kfuncs calls, however, the caller should do sign/zero extension
      for return values. In other words, the compiler is responsible to
      insert proper instructions, not the verifier.
      
      An example, provided by Yonghong Song:
      
      $ cat t.c
      extern unsigned foo(void);
      unsigned bar1(void) {
           return foo();
      }
      unsigned bar2(void) {
           if (foo()) return 10; else return 20;
      }
      
      $ clang -target bpf -mcpu=v3 -O2 -c t.c && llvm-objdump -d t.o
      t.o:    file format elf64-bpf
      
      Disassembly of section .text:
      
      0000000000000000 <bar1>:
      	0:       85 10 00 00 ff ff ff ff call -0x1
      	1:       95 00 00 00 00 00 00 00 exit
      
      0000000000000010 <bar2>:
      	2:       85 10 00 00 ff ff ff ff call -0x1
      	3:       bc 01 00 00 00 00 00 00 w1 = w0
      	4:       b4 00 00 00 14 00 00 00 w0 = 0x14
      	5:       16 01 01 00 00 00 00 00 if w1 == 0x0 goto +0x1 <LBB1_2>
      	6:       b4 00 00 00 0a 00 00 00 w0 = 0xa
      
      0000000000000038 <LBB1_2>:
      	7:       95 00 00 00 00 00 00 00 exit
      
      If the return value of 'foo()' is used in the BPF program, the proper
      zero-extension will be done.
      
      Currently, the verifier correctly marks, say, a 32-bit return value as
      subreg_def != DEF_NOT_SUBREG, but will fail performing the actual
      zero-extension, due to a verifier bug in
      opt_subreg_zext_lo32_rnd_hi32(). load_reg is not properly set to R0,
      and the following path will be taken:
      
      		if (WARN_ON(load_reg == -1)) {
      			verbose(env, "verifier bug. zext_dst is set, but no reg is defined\n");
      			return -EFAULT;
      		}
      
      A longer discussion from v1 can be found in the link below.
      
      Correct the verifier by avoiding doing explicit zero-extension of R0
      for kfunc calls. Note that R0 will still be marked as a sub-register
      for return values smaller than 64-bit.
      
      Fixes: 83a28819 ("bpf: Account for BPF_FETCH in insn_has_def32()")
      Link: https://lore.kernel.org/bpf/20221202103620.1915679-1-bjorn@kernel.org/Suggested-by: default avatarYonghong Song <yhs@meta.com>
      Signed-off-by: default avatarBjörn Töpel <bjorn@rivosinc.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221207103540.396496-1-bjorn@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d35af0a7
    • Alexei Starovoitov's avatar
      Merge branch 'Document some recent core kfunc additions' · 2d141236
      Alexei Starovoitov authored
      David Vernet says:
      
      ====================
      
      A series of recent patch sets introduced kfuncs that allowed struct
      task_struct and struct cgroup objects to be used as kptrs. These were
      introduced in [0], [1], and [2].
      
      [0]: https://lore.kernel.org/lkml/20221120051004.3605026-1-void@manifault.com/
      [1]: https://lore.kernel.org/lkml/20221122145300.251210-2-void@manifault.com/T/
      [2]: https://lore.kernel.org/lkml/20221122055458.173143-1-void@manifault.com/
      
      These are "core" kfuncs, in that they may be used by a wide variety of
      possible BPF tracepoint or struct_ops programs, and are defined in
      kernel/bpf/helpers.c. Even though as kfuncs they have no ABI stability
      guarantees, they should still be properly documented. This patch set
      adds that documentation.
      
      Some other kfuncs were added recently as well, such as
      bpf_rcu_read_lock() and bpf_rcu_read_unlock(). Those could and should be
      added to this "Core kfuncs" section as well in subsequent patch sets.
      
      Note that this patch set does not contain documentation for
      bpf_task_acquire_not_zero(), or bpf_task_kptr_get(). As discussed in
      [3], those kfuncs currently always return NULL pending resolution on how
      to properly protect their arguments using RCU.
      
      [3]: https://lore.kernel.org/all/20221206210538.597606-1-void@manifault.com/
      ---
      Changelog:
      v2 -> v3:
      - Don't document bpf_task_kptr_get(), and instead provide a more
        substantive example for bpf_cgroup_kptr_get().
      - Further clarify expected behavior of bpf_task_from_pid() in comments
        (Alexei)
      
      v1 -> v2:
      - Expand comment to specify that a map holds a reference to a task kptr
        if we don't end up releasing it (Alexei)
      - Just read task->pid instead of using a probed read (Alexei)
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2d141236
    • David Vernet's avatar
      bpf/docs: Document struct cgroup * kfuncs · 36aa10ff
      David Vernet authored
      bpf_cgroup_acquire(), bpf_cgroup_release(), bpf_cgroup_kptr_get(), and
      bpf_cgroup_ancestor(), are kfuncs that were recently added to
      kernel/bpf/helpers.c. These are "core" kfuncs in that they're available
      for use in any tracepoint or struct_ops BPF program. Though they have no
      ABI stability guarantees, we should still document them. This patch adds
      a struct cgroup * subsection to the Core kfuncs section which describes
      each of these kfuncs.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20221207204911.873646-3-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      36aa10ff
    • David Vernet's avatar
      bpf/docs: Document struct task_struct * kfuncs · 25c5e92d
      David Vernet authored
      bpf_task_acquire(), bpf_task_release(), and bpf_task_from_pid() are
      kfuncs that were recently added to kernel/bpf/helpers.c. These are
      "core" kfuncs in that they're available for use for any tracepoint or
      struct_ops BPF program. Though they have no ABI stability guarantees, we
      should still document them. This patch adds a new Core kfuncs section to
      the BPF kfuncs doc, and adds entries for all of these task kfuncs.
      
      Note that bpf_task_kptr_get() is not documented, as it still returns
      NULL while we're working to resolve how it can use RCU to ensure struct
      task_struct * lifetime.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20221207204911.873646-2-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      25c5e92d
    • Andrii Nakryiko's avatar
      selftests/bpf: convert dynptr_fail and map_kptr_fail subtests to generic tester · 26c386ec
      Andrii Nakryiko authored
      Convert big chunks of dynptr and map_kptr subtests to use generic
      verification_tester. They are switched from using manually maintained
      tables of test cases, specifying program name and expected error
      verifier message, to btf_decl_tag-based annotations directly on
      corresponding BPF programs: __failure to specify that BPF program is
      expected to fail verification, and __msg() to specify expected log
      message.
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221207201648.2990661-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      26c386ec
    • Andrii Nakryiko's avatar
      selftests/bpf: add generic BPF program tester-loader · 537c3f66
      Andrii Nakryiko authored
      It's become a common pattern to have a collection of small BPF programs
      in one BPF object file, each representing one test case. On user-space
      side of such tests we maintain a table of program names and expected
      failure or success, along with optional expected verifier log message.
      
      This works, but each set of tests reimplement this mundane code over and
      over again, which is a waste of time for anyone trying to add a new set
      of tests. Furthermore, it's quite error prone as it's way too easy to miss
      some entries in these manually maintained test tables (as evidences by
      dynptr_fail tests, in which ringbuf_release_uninit_dynptr subtest was
      accidentally missed; this is fixed in next patch).
      
      So this patch implements generic test_loader, which accepts skeleton
      name and handles the rest of details: opens and loads BPF object file,
      making sure each program is tested in isolation. Optionally each test
      case can specify expected BPF verifier log message. In case of failure,
      tester makes sure to report verifier log, but it also reports verifier
      log in verbose mode unconditionally.
      
      Now, the interesting deviation from existing custom implementations is
      the use of btf_decl_tag attribute to specify expected-to-fail vs
      expected-to-succeed markers and, optionally, expected log message
      directly next to BPF program source code, eliminating the need to
      manually create and update table of tests.
      
      We define few macros wrapping btf_decl_tag with a convention that all
      values of btf_decl_tag start with "comment:" prefix, and then utilizing
      a very simple "just_some_text_tag" or "some_key_name=<value>" pattern to
      define things like expected success/failure, expected verifier message,
      extra verifier log level (if necessary). This approach is demonstrated
      by next patch in which two existing sets of failure tests are converted.
      
      Tester supports both expected-to-fail and expected-to-succeed programs,
      though this patch set didn't convert any existing expected-to-succeed
      programs yet, as existing tests couple BPF program loading with their
      further execution through attach or test_prog_run. One way to allow
      testing scenarios like this would be ability to specify custom callback,
      executed for each successfully loaded BPF program. This is left for
      follow up patches, after some more analysis of existing test cases.
      
      This test_loader is, hopefully, a start of a test_verifier-like runner,
      but integrated into test_progs infrastructure. It will allow much better
      "user experience" of defining low-level verification tests that can take
      advantage of all the libbpf-provided nicety features on BPF side: global
      variables, declarative maps, etc.  All while having a choice of defining
      it in C or as BPF assembly (through __attribute__((naked)) functions and
      using embedded asm), depending on what makes most sense in each
      particular case. This will be explored in follow up patches as well.
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221207201648.2990661-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      537c3f66
  3. 07 Dec, 2022 15 commits
  4. 06 Dec, 2022 8 commits
  5. 05 Dec, 2022 2 commits
    • James Hilliard's avatar
      selftests/bpf: Fix conflicts with built-in functions in bpf_iter_ksym · ab0350c7
      James Hilliard authored
      Both tolower and toupper are built in c functions, we should not
      redefine them as this can result in a build error.
      
      Fixes the following errors:
      progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
         10 | static inline char tolower(char c)
            |                    ^~~~~~~
      progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>'
          4 | #include <bpf/bpf_helpers.h>
        +++ |+#include <ctype.h>
          5 |
      progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
         17 | static inline char toupper(char c)
            |                    ^~~~~~~
      progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>'
      
      See background on this sort of issue:
      https://stackoverflow.com/a/20582607
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213
      
      (C99, 7.1.3p1) "All identifiers with external linkage in any of the
      following subclauses (including the future library directions) are
      always reserved for use as identifiers with external linkage."
      
      This is documented behavior in GCC:
      https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2Signed-off-by: default avatarJames Hilliard <james.hilliard1@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221203010847.2191265-1-james.hilliard1@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ab0350c7
    • Eric Dumazet's avatar
      bpf, sockmap: fix race in sock_map_free() · 0a182f8d
      Eric Dumazet authored
      sock_map_free() calls release_sock(sk) without owning a reference
      on the socket. This can cause use-after-free as syzbot found [1]
      
      Jakub Sitnicki already took care of a similar issue
      in sock_hash_free() in commit 75e68e5b ("bpf, sockhash:
      Synchronize delete from bucket list on map free")
      
      [1]
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 0 PID: 3785 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
      Modules linked in:
      CPU: 0 PID: 3785 Comm: kworker/u4:6 Not tainted 6.1.0-rc7-syzkaller-00103-gef4d3ea4 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      Workqueue: events_unbound bpf_map_free_deferred
      RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
      Code: 68 8b 31 c0 e8 75 71 15 fd 0f 0b e9 64 ff ff ff e8 d9 6e 4e fd c6 05 62 9c 3d 0a 01 48 c7 c7 80 bb 68 8b 31 c0 e8 54 71 15 fd <0f> 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff
      RSP: 0018:ffffc9000456fb60 EFLAGS: 00010246
      RAX: eae59bab72dcd700 RBX: 0000000000000004 RCX: ffff8880207057c0
      RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
      RBP: 0000000000000004 R08: ffffffff816fdabd R09: fffff520008adee5
      R10: fffff520008adee5 R11: 1ffff920008adee4 R12: 0000000000000004
      R13: dffffc0000000000 R14: ffff88807b1c6c00 R15: 1ffff1100f638dcf
      FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b30c30000 CR3: 000000000d08e000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      __refcount_dec include/linux/refcount.h:344 [inline]
      refcount_dec include/linux/refcount.h:359 [inline]
      __sock_put include/net/sock.h:779 [inline]
      tcp_release_cb+0x2d0/0x360 net/ipv4/tcp_output.c:1092
      release_sock+0xaf/0x1c0 net/core/sock.c:3468
      sock_map_free+0x219/0x2c0 net/core/sock_map.c:356
      process_one_work+0x81c/0xd10 kernel/workqueue.c:2289
      worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
      kthread+0x266/0x300 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      </TASK>
      
      Fixes: 7e81a353 ("bpf: Sockmap, ensure sock lock held during tear down")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20221202111640.2745533-1-edumazet@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0a182f8d