- 30 Jun, 2018 9 commits
-
-
Daniel Borkmann authored
Jakub Kicinski says: ==================== Set of random updates to bpftool and libbpf. I'm preparing for extending bpftool prog load, but there is a good number of improvements that can be made before bpf -> bpf-next merge helping to keep the later patch set to a manageable size as well. First patch is a bpftool build speed improvement. Next missing program types are added to libbpf program type detection by section name. The ability to load programs from '.text' section is restored when ELF file doesn't contain any pseudo calls. In bpftool I remove my Author comments as unnecessary sign of vanity. Last but not least missing option is added to bash completions and processing of options in bash completions is improved. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
Remove options (in getopt() sense, i.e. starting with a dash like -n or --NAME) while parsing arguments for bash completions. This allows us to refer to position-dependent parameters better, and complete options at any point. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
--bpffs is not suggested by bash completions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
Drop my author comments, those are from the early days of bpftool and make little sense in tree, where we have quite a few people contributing and git to attribute the work. While at it bump some copyrights. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
Make bpf_program__next() skip over '.text' section if object file has pseudo calls. The '.text' section is hardly a program in that case, it's more of a storage for code of functions other than main. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
libbpf used to be able to load programs from the default section called '.text'. It's not very common to leave sections unnamed, but if it happens libbpf will fail to load the programs reporting -EINVAL from the kernel. The -EINVAL comes from bpf_obj_name_cpy() because since 48cca7e4 ("libbpf: add support for bpf_call") libbpf does not resolve program names for programs in '.text', defaulting to '.text'. '.text', however, does not pass the (isalnum(*src) || *src == '_') check in bpf_obj_name_cpy(). With few extra lines of code we can limit the pseudo call assumptions only to objects which actually contain code relocations. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
Users of bpf_object__open()/bpf_object__load() APIs may want to load the programs and maps onto a device for offload. Allow setting ifindex on those sub-objects. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
Specify default section names for BPF_PROG_TYPE_LIRC_MODE2 and BPF_PROG_TYPE_LWT_SEG6LOCAL, these are the only two missing right now. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
Commit 4bfe3bd3 ("tools/bpftool: use version from the kernel source tree") added version to bpftool. The version used is equal to the kernel version and obtained by running make kernelversion against kernel source tree. Version is then communicated to the sources with a command line define set in CFLAGS. Use a simply expanded variable for the version, otherwise the recursive make will run every time CFLAGS are used. This brings the single-job compilation time for me from almost 16 sec down to less than 4 sec. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 28 Jun, 2018 2 commits
-
-
Jesper Dangaard Brouer authored
XDP_TX requires also changing the MAC-addrs, else some hardware may drop the TX packet before reaching the wire. This was observed with driver mlx5. If xdp_rxq_info select --action XDP_TX the swapmac functionality is activated. It is also possible to manually enable via cmdline option --swapmac. This is practical if wanting to measure the overhead of writing/updating payload for other action types. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jesper Dangaard Brouer authored
There is a cost associated with reading the packet data payload that this test ignored. Add option --read to allow enabling reading part of the payload. This sample/tool helps us analyse an issue observed with a NIC mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4. With no_touch of data: Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch XDP stats CPU pps issue-pps XDP-RX CPU 0 14,465,157 0 XDP-RX CPU 1 14,464,728 0 XDP-RX CPU 2 14,465,283 0 XDP-RX CPU 3 14,465,282 0 XDP-RX CPU 4 14,464,159 0 XDP-RX CPU 5 14,465,379 0 XDP-RX CPU total 86,789,992 When not touching data, we observe that the CPUs have idle cycles. When reading data the CPUs are 100% busy in softirq. With reading data: Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read XDP stats CPU pps issue-pps XDP-RX CPU 0 9,620,639 0 XDP-RX CPU 1 9,489,843 0 XDP-RX CPU 2 9,407,854 0 XDP-RX CPU 3 9,422,289 0 XDP-RX CPU 4 9,321,959 0 XDP-RX CPU 5 9,395,242 0 XDP-RX CPU total 56,657,828 The effect seen above is a result of cache-misses occuring when more RXQs are being used. Based on perf-event observations, our conclusion is that the CPUs DDIO (Direct Data I/O) choose to deliver packet into main memory, instead of L3-cache. We also found, that this can be mitigated by either using less RXQs or by reducing NICs the RX-ring size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 27 Jun, 2018 12 commits
-
-
Andrey Ignatov authored
TCP Fast Open is triggered by sys_sendmsg with MSG_FASTOPEN flag for SOCK_STREAM socket. Even though it's sys_sendmsg, it eventually calls __inet_stream_connect the same way sys_connect does for TCP. __inet_stream_connect, in turn, already has BPF hooks for sys_connect. That means TFO is already covered by BPF_CGROUP_INET{4,6}_CONNECT and the only missing piece is selftest. The patch adds selftest for TFO. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Toke Høiland-Jørgensen authored
Add an example program showing how to sample packets from XDP using the perf event buffer. The example userspace program just prints the ethernet header for every packet sampled. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Toke Høiland-Jørgensen authored
Add two new helper functions to trace_helpers that supports polling multiple perf file descriptors for events. These are used to the XDP perf_event_output example, which needs to work with one perf fd per CPU. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jiong Wang authored
Map read has been supported on NFP, this patch enables optimization for memcpy from map to packet. This patch also fixed one latent bug which will cause copying from unexpected address once memcpy for map pointer enabled. The fixed code path was not exercised before. Reported-by: Mary Pham <mary.pham@netronome.com> Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
David S. Miller authored
Petr Machata says: ==================== Multipath tests for tunnel devices This patchset adds a test for ECMP and weighted ECMP between two GRE tunnels. In patches #1 and #2, the function multipath_eval() is first moved from router_multipath.sh to lib.sh for ease of reuse, and then fixed up. In patch #3, the function tc_rule_stats_get() is parameterized to be useful for egress rules as well. In patch #4, a new function __simple_if_init() is extracted from simple_if_init(). This covers the logic that needs to be done for the usual interface: VRF migration, upping and installation of IP addresses. Patch #5 then adds the test itself. Additionally in patch #6, a requirement to add diagrams to selftests is documented. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
ASCII art diagrams are well suited for presenting the topology that a test uses while being easy to embed directly in the test file iteslf. They make the information very easy to grasp even for simple topologies, and for more complex ones they are almost essential, as figuring out the interconnects from the script itself proves to be difficult. Therefore state the requirement for topology ASCII art in README. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Add a GRE-tunneling test such that there are two tunnels involved, with a multipath route listing both as next hops. Similarly to router_multipath.sh, test that the distribution of traffic to the tunnels honors the configured weights. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The function simple_if_init() does two things: it creates a VRF, then moves an interface into this VRF and configures addresses. The latter comes in handy when adding more interfaces into a VRF later on. The situation is similar for simple_if_fini(). Therefore split the interface remastering and address de/initialization logic to a new pair of helpers __simple_if_init() / __simple_if_fini(), and defer to these helpers from simple_if_init() and simple_if_fini(). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The GRE multipath tests need stats on an egress counter. Change tc_rule_stats_get() to take direction as an optional argument, with default of ingress. Take the opportunity to change line continuation character from | to \. Move the | to the next line, which indent. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
- Change the indentation of the function body from 7 spaces to one tab. - Move initialization of weights_ratio up so that it can be referenced from the error message about packet difference being zero. - Move |'s consistently to continuation line, which reindent. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
This function will be useful for the GRE multipath test that is coming later. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kees Cook authored
It looks like the prior VLA removal, commit b16520f7 ("net/tls: Remove VLA usage"), and a new VLA addition, commit c46234eb ("tls: RX path for ktls"), passed in the night. This removes the newly added VLA, which happens to have its bounds based on the same max value. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Jun, 2018 17 commits
-
-
Petr Machata authored
The IP addresses of tunnel endpoint at H3 are set at the VLAN device $h3.555. Therefore when test_gretap_untagged_egress() sets vlan 555 to egress untagged at $swp3, $h3's rp_filter rejects these packets. The test then spuriously fails. Therefore turn off net.ipv4.conf.{all, $h3}.rp_filter. Fixes: 9c7c8a82 ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests") Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kees Cook authored
In the quest to remove all stack VLA usage from the kernel[1], this allocates the values buffer during the callback instead of putting it on the stack. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comSigned-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== net: sched: support replay of filter offload when binding to block This series from John adds the ability to replay filter offload requests when new offload callback is being registered on a TC block. This is most likely to take place for shared blocks today, when a block which already has rules is bound to another interface. Prior to this patch set if any of the rules were offloaded the block bind would fail. A new tcf_proto_op is added to generate a filter-specific offload request. The new 'offload' op is supporting extack from day 0, hence we need to propagate extack to .ndo_setup_tc TC_BLOCK_BIND/TC_BLOCK_UNBIND and through tcf_block_cb_register() to tcf_block_playback_offloads(). The immediate use of this patch set is to simplify life of drivers which require duplicating rules when sharing blocks. Switch drivers (mlxsw) can bind ports to rule lists dynamically, NIC drivers generally don't have that ability and need the rules to be duplicated for each ingress they match on. In code terms this means that switch drivers don't register multiple callbacks for each port. NIC drivers do, and get a separate request and hance rule per-port, as if the block was not shared. The registration fails today, however, if some rules were already present. As John notes in description of patch 7, drivers which register multiple callbacks to shared blocks will likely need to flush the rules on block unbind. This set makes the core not only replay the the offload add requests but also offload remove requests when callback is unregistered. v2: - name parameters in patch 2; - use unsigned int instead of u32 for in_hw_coun; - improve extack message in patch 7. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Call the reoffload tcf_proto_op on all tcf_proto nodes in all chains of a block when a callback tries to register to a block that already has offloaded rules. If all existing rules cannot be offloaded then the registration is rejected. This replaces the previous policy of rejecting such callback registration outright. On unregistration of a callback, the rules are flushed for that given cb. The implementation of block sharing in the NFP driver, for example, duplicates shared rules to all devs bound to a block. This meant that rules could still exist in hw even after a device is unbound from a block (assuming the block still remains active). Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Add the offload tcf_proto_op in cls_bpf to generate an offload message for each bpf prog in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' prog. A prog contains a flag to indicate if it is in hardware or not. To ensure the offload function properly maintains this flag, keep a reference counter for the number of instances of the prog that are in hardware. Only update the flag when this counter changes from or to 0. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Add the offload tcf_proto_op in cls_u32 to generate an offload message for each filter and the hashtable in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' rule. A filter contains a flag to indicate if it is in hardware or not. To ensure the offload function properly maintains this flag, keep a reference counter for the number of instances of the filter that are in hardware. Only update the flag when this counter changes from or to 0. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Add the reoffload tcf_proto_op in matchall to generate an offload message for each filter in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' rule. Ensure matchall flags correctly report if the rule is in hw by keeping a reference counter for the number of instances of the rule offloaded. Only update the flag when this counter changes from or to 0. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Add the reoffload tcf_proto_op in flower to generate an offload message for each filter in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' rule. A filter contains a flag to indicate if it is in hardware or not. To ensure the reoffload function properly maintains this flag, keep a reference counter for the number of instances of the filter that are in hardware. Only update the flag when this counter changes from or to 0. Add a generic helper function to implement this behaviour. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Create a new tcf_proto_op called 'reoffload' that generates a new offload message for each node in a tcf_proto. Pointers to the tcf_proto and whether the offload request is to add or delete the node are included. Also included is a callback function to send the offload message to and the option of priv data to go with the cb. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Pass the extact struct from a tc qdisc add to the block bind function and, in turn, to the setup_tc ndo of binding device via the tc_block_offload struct. Pass this back to any block callback registrations to allow netlink logging of fails in the bind process. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Sergei Shtylyov says: ==================== sh_eth: RPADIR related clean-ups Here's a set of 2 patches against DaveM's 'net-next.git' repo. They are clean-ups related to RPADIR (DMA padding to NET_IP_ALIGN)... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sergei Shtylyov authored
If RPADIR exists, the value written to it is always the same for all SoCs (and derived from NET_IP_ALIGN), so there has not been any need to store it in the *struct* sh_eth_cpu_data... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sergei Shtylyov authored
The *enum* RPADIR_BIT was declared in the commit 86a74ff2 ("net: sh_eth: add support for Renesas SuperH Ethernet") adding SH771x support, however the SH771x manual doesn't have the RPADIR register described and, moreover, tells why the padding insertion must not be used. The newer SoC manuals do have RPADIR documented, though with somewhat different layout -- update the *enum* according to these manuals... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
So far unsupported WoL options are silently ignored. Change this and reject attempts to set unsupported options. This prevents situations where a user tries to set an unsupported WoL option and is under the impression it was successful because ethtool doesn't complain. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Commit 5691484d ("net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()") and commit 01b8d064 ("net: ip6_gre: Request headroom in __gre6_xmit()") fix problems in reserving headroom in the packets tunneled through ip6gre/tap and ip6erspan netdevices. These two patches included snippets that reproduced the issues. This patch elevates the snippets to a full-fledged test case. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Guillaume Nault says: ==================== l2tp: trivial cleanups Just a set of unrelated trivial cleanups (remove unused code, make local functions static, etc.). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guillaume Nault authored
It always returns 0, and nobody reads the return value anyway. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-