- 25 Nov, 2019 40 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-11-26 The following pull-request contains BPF updates for your *net-next* tree. We've added 2 non-merge commits during the last 1 day(s) which contain a total of 2 files changed, 14 insertions(+), 3 deletions(-). The main changes, 2 small fixes are: 1) Fix libbpf out of tree compilation which complained about unknown u32 type used in libbpf_find_vmlinux_btf_id() which needs to be __u32 instead, from Andrii Nakryiko. 2) Follow-up fix for the prior BPF mmap series where kbuild bot complained about missing vmalloc_user_node_flags() for no-MMU, also from Andrii Nakryiko. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller authored
Merge in networking bug fixes for merge window. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrii Nakryiko authored
u32 is not defined for libbpf when compiled outside of kernel sources (e.g., in Github projection). Use __u32 instead. Fixes: b8c54ea4 ("libbpf: Add support to attach to fentry/fexit tracing progs") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191125212948.1163343-1-andriin@fb.com
-
Andrii Nakryiko authored
To fix build with !CONFIG_MMU, implement it for no-MMU configurations as well. Fixes: fc970227 ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191123220835.1237773-1-andriin@fb.com
-
Jouni Hogander authored
Slip_open doesn't clean-up device which registration failed from the slip_devs device list. On next open after failure this list is iterated and freed device is accessed. Fix this by calling sl_free_netdev in error path. Here is the trace from the Syzbot: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:634 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 sl_sync drivers/net/slip/slip.c:725 [inline] slip_open+0xecd/0x11b7 drivers/net/slip/slip.c:801 tty_ldisc_open.isra.0+0xa3/0x110 drivers/tty/tty_ldisc.c:469 tty_set_ldisc+0x30e/0x6b0 drivers/tty/tty_ldisc.c:596 tiocsetd drivers/tty/tty_io.c:2334 [inline] tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2594 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 3b5a3997 ("slip: Fix memory leak in slip_open error path") Reported-by: syzbot+4d5170758f3762109542@syzkaller.appspotmail.com Cc: David Miller <davem@davemloft.net> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Oleksij Rempel authored
This function was using configuration of port 0 in devicetree for all ports. In case CPU port was not 0, the delay settings was ignored. This resulted not working communication between CPU and the switch. Fixes: f5b8631c ("net: dsa: sja1105: Error out if RGMII delays are requested in DT") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Menglong Dong authored
While enqueueing a broadcast skb to port->bc_queue, schedule_work() is called to add port->bc_work, which processes the skbs in bc_queue, to "events" work queue. If port->bc_queue is full, the skb will be discarded and schedule_work(&port->bc_work) won't be called. However, if port->bc_queue is full and port->bc_work is not running or pending, port->bc_queue will keep full and schedule_work() won't be called any more, and all broadcast skbs to macvlan will be discarded. This case can happen: macvlan_process_broadcast() is the pending function of port->bc_work, it moves all the skbs in port->bc_queue to the queue "list", and processes the skbs in "list". During this, new skbs will keep being added to port->bc_queue in macvlan_broadcast_enqueue(), and port->bc_queue may already full when macvlan_process_broadcast() return. This may happen, especially when there are a lot of real-time threads and the process is preempted. Fix this by calling schedule_work(&port->bc_work) even if port->bc_work is full in macvlan_broadcast_enqueue(). Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Po Liu authored
The ENETC hardware support the Credit Based Shaper(CBS) which part of the IEEE-802.1Qav. The CBS driver was loaded by the sch_cbs interface when set in the QOS in the kernel. Here is an example command to set 20Mbits bandwidth in 1Gbits port for taffic class 7: tc qdisc add dev eth0 root handle 1: mqprio \ num_tc 8 map 0 1 2 3 4 5 6 7 hw 1 tc qdisc replace dev eth0 parent 1:8 cbs \ locredit -1470 hicredit 30 \ sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Po Liu <Po.Liu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Add helpers to make locking/unlocking the MDIO bus easier. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Bauer authored
Geert Uytterhoeven reported that using devm_reset_controller_get leads to a WARNING when probing a reset-controlled PHY. This is because the device devm_reset_controller_get gets supplied is not actually the one being probed. Acquire an unmanaged reset-control as well as free the reset_control on unregister to fix this. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> CC: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-11-24 The following pull-request contains BPF updates for your *net-next* tree. We've added 27 non-merge commits during the last 4 day(s) which contain a total of 50 files changed, 2031 insertions(+), 548 deletions(-). The main changes are: 1) Optimize bpf_tail_call() from retpoline-ed indirect jump to direct jump, from Daniel. 2) Support global variables in libbpf, from Andrii. 3) Cleanup selftests with BPF_TRACE_x() macro, from Martin. 4) Fix devmap hash, from Toke. 5) Fix register bounds after 32-bit conditional jumps, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-11-24 Here's one last bluetooth-next pull request for the 5.5 kernel: - Fix BDADDR_PROPERTY & INVALID_BDADDR quirk handling - Added support for BCM4334B0 and BCM4335A0 controllers - A few other smaller fixes related to locking and memory leaks ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
-
Andreas K. Besslein authored
This enables the use of SW timestamping. ax88179_178a uses the usbnet transmit function usbnet_start_xmit() which implements software timestamping. ax88179_178a overrides ethtool_ops but missed to set .get_ts_info. This caused SOF_TIMESTAMPING_TX_SOFTWARE capability to be not available. Signed-off-by: Andreas K. Besslein <besslein.andreas@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
-
Jakub Kicinski authored
Ido Schimmel says: ==================== mlxsw: Two small updates Patch #1 from Petr handles a corner case in GRE tunnel offload. Patch #2 from Amit fixes a recent issue where the driver was programming the device to use an adjacency index (for a nexthop) that was not properly initialized. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
-
Amit Cohen authored
When mlxsw_sp_adj_discard_write() is called for the first time, the value stored in 'mlxsw_sp->router->adj_discard_index' is invalid, as indicated by 'mlxsw_sp->router->adj_discard_index_valid' being set to 'false'. In this case, we should not use the value initially stored in 'mlxsw_sp->router->adj_discard_index' (0) and instead use the value allocated later in the function. Fixes: 983db619 ("mlxsw: spectrum_router: Allocate discard adjacency entry when needed") Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
-
Petr Machata authored
When a GRE tunnel is bound to an underlay netdevice and that netdevice is moved to a different VRF, that could cause two tunnels to have the same underlay local address in the same VRF. Linux in this situation dispatches the traffic according to the tunnel key (or lack thereof), but that cannot be offloaded to Spectrum devices. Detect this situation and unoffload the two impacted tunnels when it happens. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
-
Daniel Borkmann authored
Given that we have BPF_MOD_NOP_TO_{CALL,JUMP}, BPF_MOD_{CALL,JUMP}_TO_NOP and BPF_MOD_{CALL,JUMP}_TO_{CALL,JUMP} poke types and that we also pass in old_addr as well as new_addr, it's a bit redundant and unnecessarily complicates __bpf_arch_text_poke() itself since we can derive the same from the *_addr that were passed in. Hence simplify and use BPF_MOD_{CALL,JUMP} as types which also allows to clean up call-sites. In addition to that, __bpf_arch_text_poke() currently verifies that text matches expected old_insn before we invoke text_poke_bp(). Also add a check on new_insn and skip rewrite if it already matches. Reason why this is rather useful is that it avoids making any special casing in prog_array_map_poke_run() when old and new prog were NULL and has the benefit that also for this case we perform a check on text whether it really matches our expectations. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/fcb00a2b0b288d6c73de4ef58116a821c8fe8f2f.1574555798.git.daniel@iogearbox.net
-
Martin KaFai Lau authored
For BPF_PROG_TYPE_TRACING, the bpf_prog's ctx is an array of u64. This patch borrows the idea from BPF_CALL_x in filter.h to convert a u64 to the arg type of the traced function. The new BPF_TRACE_x has an arg to specify the return type of a bpf_prog. It will be used in the future TCP-ops bpf_prog that may return "void". The new macros are defined in the new header file "bpf_trace_helpers.h". It is under selftests/bpf/ for now. It could be moved to libbpf later after seeing more upcoming non-tracing use cases. The tests are changed to use these new macros also. Hence, the k[s]u8/16/32/64 are no longer needed and they are removed from the bpf_helpers.h. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191123202504.1502696-1-kafai@fb.com
-
Daniel Borkmann authored
Add a definition of bpf_jit_blinding_enabled() when CONFIG_BPF_JIT is not set in order to fix a recent build regression: [...] CC kernel/bpf/verifier.o CC kernel/bpf/inode.o kernel/bpf/verifier.c: In function ‘fixup_bpf_calls’: kernel/bpf/verifier.c:9132:25: error: implicit declaration of function ‘bpf_jit_blinding_enabled’; did you mean ‘bpf_jit_kallsyms_enabled’? [-Werror=implicit-function-declaration] 9132 | bool expect_blinding = bpf_jit_blinding_enabled(prog); | ^~~~~~~~~~~~~~~~~~~~~~~~ | bpf_jit_kallsyms_enabled CC kernel/bpf/helpers.o CC kernel/bpf/hashtab.o [...] Fixes: d2e4c1e6 ("bpf: Constant map key tracking for prog array pokes") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/40baf8f3507cac4851a310578edfb98ce73b5605.1574541375.git.daniel@iogearbox.net
-
Alexei Starovoitov authored
Daniel Borkmann says: ==================== This gets rid of indirect jumps for BPF tail calls whenever possible. The series adds emission for *direct* jumps for tail call maps in order to avoid the retpoline overhead from a493a87f ("bpf, x64: implement retpoline for tail call") for situations that allow for it, meaning, for known constant keys at verification time which are used as index into the tail call map. See patch 7/8 for more general details. Thanks! v1 -> v2: - added more test cases - u8 ip_stable -> bool (Andrii) - removed bpf_map_poke_{un,}lock and simplified the code (Andrii) - added break into prog_array_map_poke_untrack since there's just one prog (Andrii) - fixed typo: for for in commit msg (Andrii) - reworked __bpf_arch_text_poke (Andrii) - added subtests, and comment on tests themselves, NULL-NULL transistion (Andrii) - in constant map key tracking I've moved the map_poke_track callback to once we've finished creating the poke tab as otherwise concurrent access from tail call map would blow up (since we realloc the table) rfc -> v1: - Applied Alexei's and Andrii's feeback from https://lore.kernel.org/bpf/cover.1573779287.git.daniel@iogearbox.net/T/#t ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Add several BPF kselftest cases for tail calls which test the various patch directions, and that multiple locations are patched in same and different programs. # ./test_progs -n 45 #45/1 tailcall_1:OK #45/2 tailcall_2:OK #45/3 tailcall_3:OK #45/4 tailcall_4:OK #45/5 tailcall_5:OK #45 tailcalls:OK Summary: 1/5 PASSED, 0 SKIPPED, 0 FAILED I've also verified the JITed dump after each of the rewrite cases that it matches expectations. Also regular test_verifier suite passes fine which contains further tail call tests: # ./test_verifier [...] Summary: 1563 PASSED, 0 SKIPPED, 0 FAILED Checked under JIT, interpreter and JIT + hardening. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/3d6cbecbeb171117dccfe153306e479798fb608d.1574452833.git.daniel@iogearbox.net
-
Daniel Borkmann authored
Add initial code emission for *direct* jumps for tail call maps in order to avoid the retpoline overhead from a493a87f ("bpf, x64: implement retpoline for tail call") for situations that allow for it, meaning, for known constant keys at verification time which are used as index into the tail call map. In case of Cilium which makes heavy use of tail calls, constant keys are used in the vast majority, only for a single occurrence we use a dynamic key. High level outline is that if the target prog is NULL in the map, we emit a 5-byte nop for the fall-through case and if not, we emit a 5-byte direct relative jmp to the target bpf_func + skipped prologue offset. Later during runtime, we patch these 5-byte nop/jmps upon tail call map update or deletions dynamically. Note that on x86-64 the direct jmp works as we reuse the same stack frame and skip prologue (as opposed to some other JIT implementations). One of the issues is that the tail call map slots can change at any given time even during JITing. Therefore, we have two passes: i) emit nops for all patchable locations during main JITing phase until we declare prog->jited = 1 eventually. At this point the image is stable, not public yet and with all jmps disabled. While JITing, we collect additional info like poke->ip in order to remember the patch location for later modifications. In ii) bpf_tail_call_direct_fixup() walks over the progs poke_tab, locks the tail call maps poke_mutex to prevent from parallel updates and patches in the right locations via __bpf_arch_text_poke(). Note, the main bpf_arch_text_poke() cannot be used at this point since we're not yet exposed to kallsyms. For the update we use plain memcpy() since the image is not public and still in read-write mode. After patching, we activate that poke entry through poke->ip_stable. Meaning, at this point any tail call map updates/deletions are not going to ignore that poke entry anymore. Then, bpf_arch_text_poke() might still occur on the read-write image until we finally locked it as read-only. Both modifications on the given image are under text_mutex to avoid interference with each other when update requests come in in parallel for different tail call maps (current one we have locked in JIT and different one where poke->ip_stable was already set). Example prog: # ./bpftool p d x i 1655 0: (b7) r3 = 0 1: (18) r2 = map[id:526] 3: (85) call bpf_tail_call#12 4: (b7) r0 = 1 5: (95) exit Before: # ./bpftool p d j i 1655 0xffffffffc076e55c: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 _ 19: xor %edx,%edx |_ index (arg 3) 1b: movabs $0xffff88d95cc82600,%rsi |_ map (arg 2) 25: mov %edx,%edx | index >= array->map.max_entries 27: cmp %edx,0x24(%rsi) | 2a: jbe 0x0000000000000066 |_ 2c: mov -0x224(%rbp),%eax | tail call limit check 32: cmp $0x20,%eax | 35: ja 0x0000000000000066 | 37: add $0x1,%eax | 3a: mov %eax,-0x224(%rbp) |_ 40: mov 0xd0(%rsi,%rdx,8),%rax |_ prog = array->ptrs[index] 48: test %rax,%rax | prog == NULL check 4b: je 0x0000000000000066 |_ 4d: mov 0x30(%rax),%rax | goto *(prog->bpf_func + prologue_size) 51: add $0x19,%rax | 55: callq 0x0000000000000061 | retpoline for indirect jump 5a: pause | 5c: lfence | 5f: jmp 0x000000000000005a | 61: mov %rax,(%rsp) | 65: retq |_ 66: mov $0x1,%eax 6b: pop %rbx 6c: pop %r15 6e: pop %r14 70: pop %r13 72: pop %rbx 73: leaveq 74: retq After; state after JIT: # ./bpftool p d j i 1655 0xffffffffc08e8930: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 _ 19: xor %edx,%edx |_ index (arg 3) 1b: movabs $0xffff9d8afd74c000,%rsi |_ map (arg 2) 25: mov -0x224(%rbp),%eax | tail call limit check 2b: cmp $0x20,%eax | 2e: ja 0x000000000000003e | 30: add $0x1,%eax | 33: mov %eax,-0x224(%rbp) |_ 39: jmpq 0xfffffffffffd1785 |_ [direct] goto *(prog->bpf_func + prologue_size) 3e: mov $0x1,%eax 43: pop %rbx 44: pop %r15 46: pop %r14 48: pop %r13 4a: pop %rbx 4b: leaveq 4c: retq After; state after map update (target prog): # ./bpftool p d j i 1655 0xffffffffc08e8930: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 19: xor %edx,%edx 1b: movabs $0xffff9d8afd74c000,%rsi 25: mov -0x224(%rbp),%eax 2b: cmp $0x20,%eax . 2e: ja 0x000000000000003e . 30: add $0x1,%eax . 33: mov %eax,-0x224(%rbp) |_ 39: jmpq 0xffffffffffb09f55 |_ goto *(prog->bpf_func + prologue_size) 3e: mov $0x1,%eax 43: pop %rbx 44: pop %r15 46: pop %r14 48: pop %r13 4a: pop %rbx 4b: leaveq 4c: retq After; state after map update (no prog): # ./bpftool p d j i 1655 0xffffffffc08e8930: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 19: xor %edx,%edx 1b: movabs $0xffff9d8afd74c000,%rsi 25: mov -0x224(%rbp),%eax 2b: cmp $0x20,%eax . 2e: ja 0x000000000000003e . 30: add $0x1,%eax . 33: mov %eax,-0x224(%rbp) |_ 39: nopl 0x0(%rax,%rax,1) |_ fall-through nop 3e: mov $0x1,%eax 43: pop %rbx 44: pop %r15 46: pop %r14 48: pop %r13 4a: pop %rbx 4b: leaveq 4c: retq Nice bonus is that this also shrinks the code emission quite a bit for every tail call invocation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/6ada4c1c9d35eeb5f4ecfab94593dafa6b5c4b09.1574452833.git.daniel@iogearbox.net
-
Daniel Borkmann authored
Add tracking of constant keys into tail call maps. The signature of bpf_tail_call_proto is that arg1 is ctx, arg2 map pointer and arg3 is a index key. The direct call approach for tail calls can be enabled if the verifier asserted that for all branches leading to the tail call helper invocation, the map pointer and index key were both constant and the same. Tracking of map pointers we already do from prior work via c93552c4 ("bpf: properly enforce index mask to prevent out-of-bounds speculation") and 09772d92 ("bpf: avoid retpoline for lookup/update/ delete calls on maps"). Given the tail call map index key is not on stack but directly in the register, we can add similar tracking approach and later in fixup_bpf_calls() add a poke descriptor to the progs poke_tab with the relevant information for the JITing phase. We internally reuse insn->imm for the rewritten BPF_JMP | BPF_TAIL_CALL instruction in order to point into the prog's poke_tab, and keep insn->imm as 0 as indicator that current indirect tail call emission must be used. Note that publishing to the tracker must happen at the end of fixup_bpf_calls() since adding elements to the poke_tab reallocates its memory, so we need to wait until its in final state. Future work can generalize and add similar approach to optimize plain array map lookups. Difference there is that we need to look into the key value that sits on stack. For clarity in bpf_insn_aux_data, map_state has been renamed into map_ptr_state, so we get map_{ptr,key}_state as trackers. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/e8db37f6b2ae60402fa40216c96738ee9b316c32.1574452833.git.daniel@iogearbox.net
-
Daniel Borkmann authored
This work adds program tracking to prog array maps. This is needed such that upon prog array updates/deletions we can fix up all programs which make use of this tail call map. We add ops->map_poke_{un,}track() helpers to maps to maintain the list of programs and ops->map_poke_run() for triggering the actual update. bpf_array_aux is extended to contain the list head and poke_mutex in order to serialize program patching during updates/deletions. bpf_free_used_maps() will untrack the program shortly before dropping the reference to the map. For clearing out the prog array once all urefs are dropped we need to use schedule_work() to have a sleepable context. The prog_array_map_poke_run() is triggered during updates/deletions and walks the maintained prog list. It checks in their poke_tabs whether the map and key is matching and runs the actual bpf_arch_text_poke() for patching in the nop or new jmp location. Depending on the type of update, we use one of BPF_MOD_{NOP_TO_JUMP,JUMP_TO_NOP,JUMP_TO_JUMP}. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1fb364bb3c565b3e415d5ea348f036ff379e779d.1574452833.git.daniel@iogearbox.net
-
Daniel Borkmann authored
Add initial poke table data structures and management to the BPF prog that can later be used by JITs. Also add an instance of poke specific data for tail call maps; plan for later work is to extend this also for BPF static keys. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1db285ec2ea4207ee0455b3f8e191a4fc58b9ade.1574452833.git.daniel@iogearbox.net
-
Daniel Borkmann authored
We're going to extend this with further information which is only relevant for prog array at this point. Given this info is not used in critical path, move it into its own structure such that the main array map structure can be kept on diet. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/b9ddccdb0f6f7026489ee955f16c96381e1e7238.1574452833.git.daniel@iogearbox.net
-
Daniel Borkmann authored
We later on are going to need a sleepable context as opposed to plain RCU callback in order to untrack programs we need to poke at runtime and tracking as well as image update is performed under mutex. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/09823b1d5262876e9b83a8e75df04cf0467357a4.1574452833.git.daniel@iogearbox.net
-
Daniel Borkmann authored
Add BPF_MOD_{NOP_TO_JUMP,JUMP_TO_JUMP,JUMP_TO_NOP} patching for x86 JIT in order to be able to patch direct jumps or nop them out. We need this facility in order to patch tail call jumps and in later work also BPF static keys. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/aa4784196a8e5e985af4b30a4fe5336bce6e9643.1574452833.git.daniel@iogearbox.net
-
Alexei Starovoitov authored
Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
-
Alexei Starovoitov authored
Yonghong Song says: ==================== With latest llvm, bpf selftest test_progs, which has +alu32 enabled, failed for strobemeta.o and a few other subtests. The reason is due to that verifier did not provide better var_off.mask after jmp32 instructions. This patch set addressed this issue and after the fix, test_progs passed with alu32. Patch #1 provided detailed explanation of the problem and the fix. Patch #2 added three tests in test_verifier. Changelog: v1 -> v2: - do not directly manipulate tnum.{value,mask} in __reg_bound_offset32(), using tnum_lshift/tnum_rshift functions instead - do __reg_bound_offset32() after regular 64bit __reg_bound_offset() since the latter may give a better upper 32bit var_off, which can be inherited by __reg_bound_offset32(). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
test_core_reloc_kernel.c selftest is the only CO-RE test that reads and returns for validation calling thread's information (pid, tgid, comm). Thus it has to make sure that only test_prog's invocations are honored. Fixes: df36e621 ("selftests/bpf: add CO-RE relocs testing setup") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191121175900.3486133-1-andriin@fb.com
-
Yonghong Song authored
Three test cases are added. Test 1: jmp32 'reg op imm'. Test 2: jmp32 'reg op reg' where dst 'reg' has unknown constant and src 'reg' has known constant Test 3: jmp32 'reg op reg' where dst 'reg' has known constant and src 'reg' has unknown constant Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191121170651.449096-1-yhs@fb.com
-
Andrii Nakryiko authored
If bpf_object__open_file() gets path like "some/dir/obj.o", it should derive BPF object's name as "obj" (unless overriden through opts->object_name). Instead, due to using `path` as a fallback value for opts->obj_name, path is used as is for object name, so for above example BPF object's name will be verbatim "some/dir/obj", which leads to all sorts of troubles, especially when internal maps are concern (they are using up to 8 characters of object name). Fix that by ensuring object_name stays NULL, unless overriden. Fixes: 291ee02b ("libbpf: Refactor bpf_object__open APIs to use common opts") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191122003527.551556-1-andriin@fb.com
-
Yonghong Song authored
With latest llvm (trunk https://github.com/llvm/llvm-project), test_progs, which has +alu32 enabled, failed for strobemeta.o. The verifier output looks like below with edit to replace large decimal numbers with hex ones. 193: (85) call bpf_probe_read_user_str#114 R0=inv(id=0) 194: (26) if w0 > 0x1 goto pc+4 R0_w=inv(id=0,umax_value=0xffffffff00000001) 195: (6b) *(u16 *)(r7 +80) = r0 196: (bc) w6 = w0 R6_w=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0xffffffff)) 197: (67) r6 <<= 32 R6_w=inv(id=0,smax_value=0x7fffffff00000000,umax_value=0xffffffff00000000, var_off=(0x0; 0xffffffff00000000)) 198: (77) r6 >>= 32 R6=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0xffffffff)) ... 201: (79) r8 = *(u64 *)(r10 -416) R8_w=map_value(id=0,off=40,ks=4,vs=13872,imm=0) 202: (0f) r8 += r6 R8_w=map_value(id=0,off=40,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff)) 203: (07) r8 += 9696 R8_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff)) ... 255: (bf) r1 = r8 R1_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff)) ... 257: (85) call bpf_probe_read_user_str#114 R1 unbounded memory access, make sure to bounds check any array access into a map The value range for register r6 at insn 198 should be really just 0/1. The umax_value=0xffffffff caused later verification failure. After jmp instructions, the current verifier already tried to use just obtained information to get better register range. The current mechanism is for 64bit register only. This patch implemented to tighten the range for 32bit sub-registers after jmp32 instructions. With the patch, we have the below range ranges for the above code sequence: 193: (85) call bpf_probe_read_user_str#114 R0=inv(id=0) 194: (26) if w0 > 0x1 goto pc+4 R0_w=inv(id=0,smax_value=0x7fffffff00000001,umax_value=0xffffffff00000001, var_off=(0x0; 0xffffffff00000001)) 195: (6b) *(u16 *)(r7 +80) = r0 196: (bc) w6 = w0 R6_w=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0x1)) 197: (67) r6 <<= 32 R6_w=inv(id=0,umax_value=0x100000000,var_off=(0x0; 0x100000000)) 198: (77) r6 >>= 32 R6=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) ... 201: (79) r8 = *(u64 *)(r10 -416) R8_w=map_value(id=0,off=40,ks=4,vs=13872,imm=0) 202: (0f) r8 += r6 R8_w=map_value(id=0,off=40,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1)) 203: (07) r8 += 9696 R8_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1)) ... 255: (bf) r1 = r8 R1_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1)) ... 257: (85) call bpf_probe_read_user_str#114 ... At insn 194, the register R0 has better var_off.mask and smax_value. Especially, the var_off.mask ensures later lshift and rshift maintains proper value range. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191121170650.449030-1-yhs@fb.com
-
Toke Høiland-Jørgensen authored
Tetsuo pointed out that it was not only the device unregister hook that was broken for devmap_hash types, it was also cleanup on map free. So better fix this as well. While we're at it, there's no reason to allocate the netdev_map array for DEVMAP_HASH, so skip that and adjust the cost accordingly. Fixes: 6f9d451a ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191121133612.430414-1-toke@redhat.com
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== This patch set salvages all the non-extern-specific changes out of blocked externs patch set ([0]). In addition to small clean ups, it also refactors libbpf's handling of relocations and allows support for global (non-static) variables. [0] https://patchwork.ozlabs.org/project/netdev/list/?series=143358&state=* ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add exra level of verboseness, activated by -vvv argument. When -vv is specified, verbose libbpf and verifier log (level 1) is output, even for successful tests. With -vvv, verifier log goes to level 2. This is extremely useful to debug verifier failures, as well as just see the state and flow of verification. Before this, you'd have to go and modify load_program()'s source code inside libbpf to specify extra log_level flags, which is suboptimal to say the least. Currently -vv and -vvv triggering verifier output is integrated into test_stub's bpf_prog_load as well as bpf_verif_scale.c tests. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191120003548.4159797-1-andriin@fb.com
-
Andrii Nakryiko authored
Initialized global variables are no different in ELF from static variables, and don't require any extra support from libbpf. But they are matching semantics of global data (backed by BPF maps) more closely, preventing LLVM/Clang from aggressively inlining constant values and not requiring volatile incantations to prevent those. This patch enables global variables. It still disables uninitialized variables, which will be put into special COM (common) ELF section, because BPF doesn't allow uninitialized data to be accessed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191121070743.1309473-5-andriin@fb.com
-
Jakub Kicinski authored
If selftests are copied over to another machine/location for execution the build test of bpftool will obviously not work, since the sources are not copied. Skip it if we can't find bpftool's Makefile. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191119105010.19189-3-quentin.monnet@netronome.com
-
Andrii Nakryiko authored
Fix a bunch of warnings and errors reported by checkpatch.pl, to make it easier to spot new problems. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191121070743.1309473-4-andriin@fb.com
-