1. 17 Oct, 2019 17 commits
    • Andrii Nakryiko's avatar
      selftests/bpf: Add simple per-test targets to Makefile · 03dcb784
      Andrii Nakryiko authored
      Currently it's impossible to do `make test_progs` and have only
      test_progs be built, because all the binary targets are defined in terms
      of $(OUTPUT)/<binary>, and $(OUTPUT) is absolute path to current
      directory (or whatever gets overridden to by user).
      
      This patch adds simple re-directing targets for all test targets making
      it possible to do simple and nice `make test_progs` (and any other
      target).
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-5-andriin@fb.com
      03dcb784
    • Andrii Nakryiko's avatar
      selftests/bpf: Switch test_maps to test_progs' test.h format · ee6c52e9
      Andrii Nakryiko authored
      Make test_maps use tests.h header format consistent with the one used by
      test_progs, to facilitate unification.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-4-andriin@fb.com
      ee6c52e9
    • Andrii Nakryiko's avatar
      selftests/bpf: Make CO-RE reloc test impartial to test_progs flavor · d25c5e23
      Andrii Nakryiko authored
      test_core_reloc_kernel test captures its own process name and validates
      it as part of the test. Given extra "flavors" of test_progs, this break
      for anything by default test_progs binary. Fix the test to cut out
      flavor part of the process name.
      
      Fixes: ee2eb063 ("selftests/bpf: Add BPF_CORE_READ and BPF_CORE_READ_STR_INTO macro tests")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-3-andriin@fb.com
      d25c5e23
    • Andrii Nakryiko's avatar
      selftests/bpf: Teach test_progs to cd into subdir · 0b6e71c3
      Andrii Nakryiko authored
      We are building a bunch of "flavors" of test_progs, e.g., w/ alu32 flag
      for Clang when building BPF object. test_progs setup is relying on
      having all the BPF object files and extra resources to be available in
      current working directory, though. But we actually build all these files
      into a separate sub-directory. Next set of patches establishes
      convention of naming "flavored" test_progs (and test runner binaries in
      general) as test_progs-flavor (e.g., test_progs-alu32), for each such
      extra flavor. This patch teaches test_progs binary to automatically
      detect its own extra flavor based on its argv[0], and if present, to
      change current directory to a flavor-specific subdirectory.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-2-andriin@fb.com
      0b6e71c3
    • Jakub Sitnicki's avatar
      selftests/bpf: Restore the netns after flow dissector reattach test · 8d285a3b
      Jakub Sitnicki authored
      flow_dissector_reattach test changes the netns we run in but does not
      restore it to the one we started in when finished. This interferes with
      tests that run after it. Fix it by restoring the netns when done.
      
      Fixes: f97eea17 ("selftests/bpf: Check that flow dissector can be re-attached")
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reported-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191017083752.30999-1-jakub@cloudflare.com
      8d285a3b
    • Daniel Borkmann's avatar
      Merge branch 'bpf-btf-trace' · 0142fdc8
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      v2->v3:
      - while trying to adopt btf-based tracing in production service realized
        that disabling bpf_probe_read() was premature. The real tracing program
        needs to see much more than this type safe tracking can provide.
        With these patches the verifier will be able to see that skb->data
        is a pointer to 'u8 *', but it cannot possibly know how many bytes
        of it is readable. Hence bpf_probe_read() is necessary to do basic
        packet reading from tracing program. Some helper can be introduced to
        solve this particular problem, but there are other similar structures.
        Another issue is bitfield reading. The support for bitfields
        is coming to llvm. libbpf will be supporting it eventually as well,
        but there will be corner cases where bpf_probe_read() is necessary.
        The long term goal is still the same: get rid of probe_read eventually.
      - fixed build issue with clang reported by Nathan Chancellor.
      - addressed a ton of comments from Andrii.
        bitfields and arrays are explicitly unsupported in btf-based tracking.
        This will be improved in the future.
        Right now the verifier is more strict than necessary.
        In some cases it can fall back to 'scalar' instead of rejecting
        the program, but rejection today allows to make better decisions
        in the future.
      - adjusted testcase to demo bitfield and skb->data reading.
      
      v1->v2:
      - addressed feedback from Andrii and Eric. Thanks a lot for review!
      - added missing check at raw_tp attach time.
      - Andrii noticed that expected_attach_type cannot be reused.
        Had to introduce new field to bpf_attr.
      - cleaned up logging nicely by introducing bpf_log() helper.
      - rebased.
      
      Revolutionize bpf tracing and bpf C programming.
      C language allows any pointer to be typecasted to any other pointer
      or convert integer to a pointer.
      Though bpf verifier is operating at assembly level it has strict type
      checking for fixed number of types.
      Known types are defined in 'enum bpf_reg_type'.
      For example:
      PTR_TO_FLOW_KEYS is a pointer to 'struct bpf_flow_keys'
      PTR_TO_SOCKET is a pointer to 'struct bpf_sock',
      and so on.
      
      When it comes to bpf tracing there are no types to track.
      bpf+kprobe receives 'struct pt_regs' as input.
      bpf+raw_tracepoint receives raw kernel arguments as an array of u64 values.
      It was up to bpf program to interpret these integers.
      Typical tracing program looks like:
      int bpf_prog(struct pt_regs *ctx)
      {
          struct net_device *dev;
          struct sk_buff *skb;
          int ifindex;
      
          skb = (struct sk_buff *) ctx->di;
          bpf_probe_read(&dev, sizeof(dev), &skb->dev);
          bpf_probe_read(&ifindex, sizeof(ifindex), &dev->ifindex);
      }
      Addressing mistakes will not be caught by C compiler or by the verifier.
      The program above could have typecasted ctx->si to skb and page faulted
      on every bpf_probe_read().
      bpf_probe_read() allows reading any address and suppresses page faults.
      Typical program has hundreds of bpf_probe_read() calls to walk
      kernel data structures.
      Not only tracing program would be slow, but there was always a risk
      that bpf_probe_read() would read mmio region of memory and cause
      unpredictable hw behavior.
      
      With introduction of Compile Once Run Everywhere technology in libbpf
      and in LLVM and BPF Type Format (BTF) the verifier is finally ready
      for the next step in program verification.
      Now it can use in-kernel BTF to type check bpf assembly code.
      
      Equivalent program will look like:
      struct trace_kfree_skb {
          struct sk_buff *skb;
          void *location;
      };
      SEC("raw_tracepoint/kfree_skb")
      int trace_kfree_skb(struct trace_kfree_skb* ctx)
      {
          struct sk_buff *skb = ctx->skb;
          struct net_device *dev;
          int ifindex;
      
          __builtin_preserve_access_index(({
              dev = skb->dev;
              ifindex = dev->ifindex;
          }));
      }
      
      These patches teach bpf verifier to recognize kfree_skb's first argument
      as 'struct sk_buff *' because this is what kernel C code is doing.
      The bpf program cannot 'cheat' and say that the first argument
      to kfree_skb raw_tracepoint is some other type.
      The verifier will catch such type mismatch between bpf program
      assumption of kernel code and the actual type in the kernel.
      
      Furthermore skb->dev access is type tracked as well.
      The verifier can see which field of skb is being read
      in bpf assembly. It will match offset to type.
      If bpf program has code:
      struct net_device *dev = (void *)skb->len;
      C compiler will not complain and generate bpf assembly code,
      but the verifier will recognize that integer 'len' field
      is being accessed at offsetof(struct sk_buff, len) and will reject
      further dereference of 'dev' variable because it contains
      integer value instead of a pointer.
      
      Such sophisticated type tracking allows calling networking
      bpf helpers from tracing programs.
      This patchset allows calling bpf_skb_event_output() that dumps
      skb data into perf ring buffer.
      It greatly improves observability.
      Now users can not only see packet lenth of the skb
      about to be freed in kfree_skb() kernel function, but can
      dump it to user space via perf ring buffer using bpf helper
      that was previously available only to TC and socket filters.
      See patch 10 for full example.
      
      The end result is safer and faster bpf tracing.
      Safer - because type safe direct load can be used most of the time
      instead of bpf_probe_read().
      Faster - because direct loads are used to walk kernel data structures
      instead of bpf_probe_read() calls.
      Note that such loads can page fault and are supported by
      hidden bpf_probe_read() in interpreter and via exception table
      if program is JITed.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0142fdc8
    • Alexei Starovoitov's avatar
      selftests/bpf: Add kfree_skb raw_tp test · 580d656d
      Alexei Starovoitov authored
      Load basic cls_bpf program.
      Load raw_tracepoint program and attach to kfree_skb raw tracepoint.
      Trigger cls_bpf via prog_test_run.
      At the end of test_run kernel will call kfree_skb
      which will trigger trace_kfree_skb tracepoint.
      Which will call our raw_tracepoint program.
      Which will take that skb and will dump it into perf ring buffer.
      Check that user space received correct packet.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-12-ast@kernel.org
      580d656d
    • Alexei Starovoitov's avatar
      bpf: Check types of arguments passed into helpers · a7658e1a
      Alexei Starovoitov authored
      Introduce new helper that reuses existing skb perf_event output
      implementation, but can be called from raw_tracepoint programs
      that receive 'struct sk_buff *' as tracepoint argument or
      can walk other kernel data structures to skb pointer.
      
      In order to do that teach verifier to resolve true C types
      of bpf helpers into in-kernel BTF ids.
      The type of kernel pointer passed by raw tracepoint into bpf
      program will be tracked by the verifier all the way until
      it's passed into helper function.
      For example:
      kfree_skb() kernel function calls trace_kfree_skb(skb, loc);
      bpf programs receives that skb pointer and may eventually
      pass it into bpf_skb_output() bpf helper which in-kernel is
      implemented via bpf_skb_event_output() kernel function.
      Its first argument in the kernel is 'struct sk_buff *'.
      The verifier makes sure that types match all the way.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-11-ast@kernel.org
      a7658e1a
    • Alexei Starovoitov's avatar
      bpf: Add support for BTF pointers to x86 JIT · 3dec541b
      Alexei Starovoitov authored
      Pointer to BTF object is a pointer to kernel object or NULL.
      Such pointers can only be used by BPF_LDX instructions.
      The verifier changed their opcode from LDX|MEM|size
      to LDX|PROBE_MEM|size to make JITing easier.
      The number of entries in extable is the number of BPF_LDX insns
      that access kernel memory via "pointer to BTF type".
      Only these load instructions can fault.
      Since x86 extable is relative it has to be allocated in the same
      memory region as JITed code.
      Allocate it prior to last pass of JITing and let the last pass populate it.
      Pointer to extable in bpf_prog_aux is necessary to make page fault
      handling fast.
      Page fault handling is done in two steps:
      1. bpf_prog_kallsyms_find() finds BPF program that page faulted.
         It's done by walking rb tree.
      2. then extable for given bpf program is binary searched.
      This process is similar to how page faulting is done for kernel modules.
      The exception handler skips over faulting x86 instruction and
      initializes destination register with zero. This mimics exact
      behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed).
      
      JITs for other architectures can add support in similar way.
      Until then they will reject unknown opcode and fallback to interpreter.
      
      Since extable should be aligned and placed near JITed code
      make bpf_jit_binary_alloc() return 4 byte aligned image offset,
      so that extable aligning formula in bpf_int_jit_compile() doesn't need
      to rely on internal implementation of bpf_jit_binary_alloc().
      On x86 gcc defaults to 16-byte alignment for regular kernel functions
      due to better performance. JITed code may be aligned to 16 in the future,
      but it will use 4 in the meantime.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-10-ast@kernel.org
      3dec541b
    • Alexei Starovoitov's avatar
      bpf: Add support for BTF pointers to interpreter · 2a02759e
      Alexei Starovoitov authored
      Pointer to BTF object is a pointer to kernel object or NULL.
      The memory access in the interpreter has to be done via probe_kernel_read
      to avoid page faults.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-9-ast@kernel.org
      2a02759e
    • Alexei Starovoitov's avatar
      bpf: Attach raw_tp program with BTF via type name · ac4414b5
      Alexei Starovoitov authored
      BTF type id specified at program load time has all
      necessary information to attach that program to raw tracepoint.
      Use kernel type name to find raw tracepoint.
      
      Add missing CHECK_ATTR() condition.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-8-ast@kernel.org
      ac4414b5
    • Alexei Starovoitov's avatar
      bpf: Implement accurate raw_tp context access via BTF · 9e15db66
      Alexei Starovoitov authored
      libbpf analyzes bpf C program, searches in-kernel BTF for given type name
      and stores it into expected_attach_type.
      The kernel verifier expects this btf_id to point to something like:
      typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
      which represents signature of raw_tracepoint "kfree_skb".
      
      Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
      and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
      In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
      and 'void *' in second case.
      
      Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
      Like PTR_TO_SOCKET points to 'struct bpf_sock',
      PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
      PTR_TO_BTF_ID points to in-kernel structs.
      If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
      then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.
      
      When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
      the btf_struct_access() checks which field of 'struct sk_buff' is
      at offset 32. Checks that size of access matches type definition
      of the field and continues to track the dereferenced type.
      If that field was a pointer to 'struct net_device' the r2's type
      will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
      in vmlinux's BTF.
      
      Such verifier analysis prevents "cheating" in BPF C program.
      The program cannot cast arbitrary pointer to 'struct sk_buff *'
      and access it. C compiler would allow type cast, of course,
      but the verifier will notice type mismatch based on BPF assembly
      and in-kernel BTF.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-7-ast@kernel.org
      9e15db66
    • Alexei Starovoitov's avatar
      libbpf: Auto-detect btf_id of BTF-based raw_tracepoints · f75a697e
      Alexei Starovoitov authored
      It's a responsiblity of bpf program author to annotate the program
      with SEC("tp_btf/name") where "name" is a valid raw tracepoint.
      The libbpf will try to find "name" in vmlinux BTF and error out
      in case vmlinux BTF is not available or "name" is not found.
      If "name" is indeed a valid raw tracepoint then in-kernel BTF
      will have "btf_trace_##name" typedef that points to function
      prototype of that raw tracepoint. BTF description captures
      exact argument the kernel C code is passing into raw tracepoint.
      The kernel verifier will check the types while loading bpf program.
      
      libbpf keeps BTF type id in expected_attach_type, but since
      kernel ignores this attribute for tracing programs copy it
      into attach_btf_id attribute before loading.
      
      Later the kernel will use prog->attach_btf_id to select raw tracepoint
      during bpf_raw_tracepoint_open syscall command.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-6-ast@kernel.org
      f75a697e
    • Alexei Starovoitov's avatar
      bpf: Add attach_btf_id attribute to program load · ccfe29eb
      Alexei Starovoitov authored
      Add attach_btf_id attribute to prog_load command.
      It's similar to existing expected_attach_type attribute which is
      used in several cgroup based program types.
      Unfortunately expected_attach_type is ignored for
      tracing programs and cannot be reused for new purpose.
      Hence introduce attach_btf_id to verify bpf programs against
      given in-kernel BTF type id at load time.
      It is strictly checked to be valid for raw_tp programs only.
      In a later patches it will become:
      btf_id == 0 semantics of existing raw_tp progs.
      btd_id > 0 raw_tp with BTF and additional type safety.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-5-ast@kernel.org
      ccfe29eb
    • Alexei Starovoitov's avatar
      bpf: Process in-kernel BTF · 8580ac94
      Alexei Starovoitov authored
      If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
      for further use by the verifier.
      In-kernel BTF is trusted just like kallsyms and other build artifacts
      embedded into vmlinux.
      Yet run this BTF image through BTF verifier to make sure
      that it is valid and it wasn't mangled during the build.
      If in-kernel BTF is incorrect it means either gcc or pahole or kernel
      are buggy. In such case disallow loading BPF programs.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-4-ast@kernel.org
      8580ac94
    • Alexei Starovoitov's avatar
      bpf: Add typecast to bpf helpers to help BTF generation · 7c6a469e
      Alexei Starovoitov authored
      When pahole converts dwarf to btf it emits only used types.
      Wrap existing bpf helper functions into typedef and use it in
      typecast to make gcc emits this type into dwarf.
      Then pahole will convert it to btf.
      The "btf_#name_of_helper" types will be used to figure out
      types of arguments of bpf helpers.
      The generated code before and after is the same.
      Only dwarf and btf sections are different.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-3-ast@kernel.org
      7c6a469e
    • Alexei Starovoitov's avatar
      bpf: Add typecast to raw_tracepoints to help BTF generation · e8c423fb
      Alexei Starovoitov authored
      When pahole converts dwarf to btf it emits only used types.
      Wrap existing __bpf_trace_##template() function into
      btf_trace_##template typedef and use it in type cast to
      make gcc emits this type into dwarf. Then pahole will convert it to btf.
      The "btf_trace_" prefix will be used to identify BTF enabled raw tracepoints.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-2-ast@kernel.org
      e8c423fb
  2. 16 Oct, 2019 2 commits
    • Song Liu's avatar
      bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack() · eac9153f
      Song Liu authored
      bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
      deadlock on rq_lock():
      
      rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
      [...]
      Call Trace:
       try_to_wake_up+0x1ad/0x590
       wake_up_q+0x54/0x80
       rwsem_wake+0x8a/0xb0
       bpf_get_stack+0x13c/0x150
       bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
       bpf_overflow_handler+0x60/0x100
       __perf_event_overflow+0x4f/0xf0
       perf_swevent_overflow+0x99/0xc0
       ___perf_sw_event+0xe7/0x120
       __schedule+0x47d/0x620
       schedule+0x29/0x90
       futex_wait_queue_me+0xb9/0x110
       futex_wait+0x139/0x230
       do_futex+0x2ac/0xa50
       __x64_sys_futex+0x13c/0x180
       do_syscall_64+0x42/0x100
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This can be reproduced by:
      1. Start a multi-thread program that does parallel mmap() and malloc();
      2. taskset the program to 2 CPUs;
      3. Attach bpf program to trace_sched_switch and gather stackmap with
         build-id, e.g. with trace.py from bcc tools:
         trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch
      
      A sample reproducer is attached at the end.
      
      This could also trigger deadlock with other locks that are nested with
      rq_lock.
      
      Fix this by checking whether irqs are disabled. Since rq_lock and all
      other nested locks are irq safe, it is safe to do up_read() when irqs are
      not disable. If the irqs are disabled, postpone up_read() in irq_work.
      
      Fixes: 615755a7 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com
      
      Reproducer:
      ============================ 8< ============================
      
      char *filename;
      
      void *worker(void *p)
      {
              void *ptr;
              int fd;
              char *pptr;
      
              fd = open(filename, O_RDONLY);
              if (fd < 0)
                      return NULL;
              while (1) {
                      struct timespec ts = {0, 1000 + rand() % 2000};
      
                      ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                      usleep(1);
                      if (ptr == MAP_FAILED) {
                              printf("failed to mmap\n");
                              break;
                      }
                      munmap(ptr, 4096 * 64);
                      usleep(1);
                      pptr = malloc(1);
                      usleep(1);
                      pptr[0] = 1;
                      usleep(1);
                      free(pptr);
                      usleep(1);
                      nanosleep(&ts, NULL);
              }
              close(fd);
              return NULL;
      }
      
      int main(int argc, char *argv[])
      {
              void *ptr;
              int i;
              pthread_t threads[THREAD_COUNT];
      
              if (argc < 2)
                      return 0;
      
              filename = argv[1];
      
              for (i = 0; i < THREAD_COUNT; i++) {
                      if (pthread_create(threads + i, NULL, worker, NULL)) {
                              fprintf(stderr, "Error creating thread\n");
                              return 0;
                      }
              }
      
              for (i = 0; i < THREAD_COUNT; i++)
                      pthread_join(threads[i], NULL);
              return 0;
      }
      ============================ 8< ============================
      eac9153f
    • Jakub Sitnicki's avatar
      scripts/bpf: Emit an #error directive known types list needs updating · 456a513b
      Jakub Sitnicki authored
      Make the compiler report a clear error when bpf_helpers_doc.py needs
      updating rather than rely on the fact that Clang fails to compile
      English:
      
      ../../../lib/bpf/bpf_helper_defs.h:2707:1: error: unknown type name 'Unrecognized'
      Unrecognized type 'struct bpf_inet_lookup', please add it to known types!
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191016085811.11700-1-jakub@cloudflare.com
      456a513b
  3. 15 Oct, 2019 14 commits
  4. 14 Oct, 2019 6 commits
    • David S. Miller's avatar
      Merge branch 'PTP-driver-refactoring-for-SJA1105-DSA' · 85a83a8f
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      PTP driver refactoring for SJA1105 DSA
      
      This series creates a better separation between the driver core and the
      PTP portion. Therefore, users who are not interested in PTP can get a
      simpler and smaller driver by compiling it out.
      
      This is in preparation for further patches: SPI transfer timestamping,
      synchronizing the hardware clock (as opposed to keeping it
      free-running), PPS input/output, etc.
      ====================
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85a83a8f
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Change the PTP command access pattern · 66427778
      Vladimir Oltean authored
      The PTP command register contains enable bits for:
      - Putting the 64-bit PTPCLKVAL register in add/subtract or write mode
      - Taking timestamps off of the corrected vs free-running clock
      - Starting/stopping the TTEthernet scheduling
      - Starting/stopping PPS output
      - Resetting the switch
      
      When a command needs to be issued (e.g. "change the PTPCLKVAL from write
      mode to add/subtract mode"), one cannot simply write to the command
      register setting the PTPCLKADD bit to 1, because that would zeroize the
      other settings. One also cannot do a read-modify-write (that would be
      too easy for this hardware) because not all bits of the command register
      are readable over SPI.
      
      So this leaves us with the only option of keeping the value of the PTP
      command register in the driver, and operating on that.
      
      Actually there are 2 types of PTP operations now:
      - Operations that modify the cached PTP command. These operate on
        ptp_data->cmd as a pointer.
      - Operations that apply all previously cached PTP settings, but don't
        otherwise cache what they did themselves. The sja1105_ptp_reset
        function is such an example. It copies the ptp_data->cmd on stack
        before modifying and writing it to SPI.
      
      This practically means that struct sja1105_ptp_cmd is no longer an
      implementation detail, since it needs to be stored in full into struct
      sja1105_ptp_data, and hence in struct sja1105_private. So the (*ptp_cmd)
      function prototype can change and take struct sja1105_ptp_cmd as second
      argument now.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66427778
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Move PTP data to its own private structure · a9d6ed7a
      Vladimir Oltean authored
      This is a non-functional change with 2 goals (both for the case when
      CONFIG_NET_DSA_SJA1105_PTP is not enabled):
      
      - Reduce the size of the sja1105_private structure.
      - Make the PTP code more self-contained.
      
      Leaving priv->ptp_data.lock to be initialized in sja1105_main.c is not a
      leftover: it will be used in a future patch "net: dsa: sja1105: Restore
      PTP time after switch reset".
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9d6ed7a
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Make all public PTP functions take dsa_switch as argument · 61c77126
      Vladimir Oltean authored
      The new rule (as already started for sja1105_tas.h) is for functions of
      optional driver components (ones which may be disabled via Kconfig - PTP
      and TAS) to take struct dsa_switch *ds instead of struct sja1105_private
      *priv as first argument.
      
      This is so that forward-declarations of struct sja1105_private can be
      avoided.
      
      So make sja1105_ptp.h the second user of this rule.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61c77126
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Get rid of global declaration of struct ptp_clock_info · 5b3ae43a
      Vladimir Oltean authored
      We need priv->ptp_caps to hold a structure and not just a pointer,
      because we use container_of in the various PTP callbacks.
      
      Therefore, the sja1105_ptp_caps structure declared in the global memory
      of the driver serves no further purpose after copying it into
      priv->ptp_caps.
      
      So just populate priv->ptp_caps with the needed operations and remove
      sja1105_ptp_caps.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b3ae43a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · a98d62c3
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2019-10-14
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      12 days of development and
      85 files changed, 1889 insertions(+), 1020 deletions(-)
      
      The main changes are:
      
      1) auto-generation of bpf_helper_defs.h, from Andrii.
      
      2) split of bpf_helpers.h into bpf_{helpers, helper_defs, endian, tracing}.h
         and move into libbpf, from Andrii.
      
      3) Track contents of read-only maps as scalars in the verifier, from Andrii.
      
      4) small x86 JIT optimization, from Daniel.
      
      5) cross compilation support, from Ivan.
      
      6) bpf flow_dissector enhancements, from Jakub and Stanislav.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a98d62c3
  5. 13 Oct, 2019 1 commit