- 18 Dec, 2015 5 commits
-
-
Stephen Hemminger authored
-
Paolo Abeni authored
Currently parse_encap_ip() does not update correctly argv/argc; if multiple lwtunnel arguments are provided, the parsing fails after the first one, i.e. ip route add 172.16.101.0/24 dev vxlan1 encap ip id 42 dst 192.168.255.1 fails with: Error: either "to" is duplicate, or "dst" is a garbage. This commit addresses the issue, stepping to next argument at each iteration of the parsing loop. Fixes: 1e529305 ("lwtunnel: Add encapsulation support to ip route") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Phil Sutter authored
Commit 0f754332 ("route: ignore RTAX_HOPLIMIT of value -1") accidentally reordered fprintf statements. This patch restores the original ordering. Fixes: 0f754332 ("route: ignore RTAX_HOPLIMIT of value -1") Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Konstantin Khlebnikov authored
Though dumping such entries crashes present kernels. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
-
Tom Herbert authored
This patch: - Adds a utility function for parsing a 64 bit address - Adds a utility function for converting a 64 bit address to ASCII - Adds and ILA encap type in lwt tunnels Signed-off-by: Tom Herbert <tom@herbertland.com>
-
- 10 Dec, 2015 7 commits
-
-
Daniel Borkmann authored
Improve example files further and add a more generic set of possible helpers for them that can be used. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
-
Stephen Hemminger authored
-
Stephen Hemminger authored
The tunnel code was doing sscanf(buf, "%ld", &x) where x was unsigned long.
-
Phil Sutter authored
Just a typo there, it's spelled correctly in SEE ALSO section.. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
David Ahern authored
Currently, the table id for VRF devices requires an integer. Convert it to use rtnl_rttable_a2n which handles table names from the iproute2 directory. This also fixes a bug in the original commit where table name are not properly handled. Fixes: 15faa0a3 ("add support for VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
-
Nicolas Dichtel authored
There is two variables named 'len' in rtnl_talk. In fact, commit c079e121 didn't work. For example, it was possible to trigger a seg fault with this command: $ ip link set gre2 type ip6gre hoplimit 32 Let's rename the argument len to maxlen. Fixes: c079e121 ("libnetlink: add size argument to rtnl_talk") Reported-by: Thomas Faivre <thomas.faivre@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
-
Phil Sutter authored
Older kernels use -1 internally as indicator to use the sysctl default, but they still export the setting. Newer kernels use 0 to indicate that (which is why the conversion from -1 to 0 was done here), but they also stopped exporting the value. Since the meaning of -1 is clear, treat it equally like default on newer kernels (which is to not print anything). Signed-off-by: Phil Sutter <phil@nwl.cc>
-
- 29 Nov, 2015 23 commits
-
-
Stephen Hemminger authored
Make iptunnel pass checkpatch (mostly).
-
Konstantin Shemyak authored
On 24.11.2015 02:26, Stephen Hemminger wrote: > On Thu, 12 Nov 2015 21:10:08 +0000 > Konstantin Shemyak <konstantin@shemyak.com> wrote: > >> When creating an IP tunnel over IPv6, the address family must be passed in >> the option, e.g. >> >> ip -6 tunnel add mode ip6gre local 1::1 remote 2::2 >> >> This makes it impossible to create both IPv4 and IPv6 tunnels in one batch. >> >> In fact the address family option is redundant here, as each tunnel mode is >> relevant for only one address family. >> The patch determines whether the applicable address family is AF_INET6 >> instead of the default AF_INET and makes the "-6" option unnecessary for >> "ip tunnel add". >> >> Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com> >> --- >> ip/iptunnel.c | 26 ++++++++++++++++++++++++++ >> testsuite/tests/ip/tunnel/add_tunnel.t | 14 ++++++++++++++ >> 2 files changed, 40 insertions(+) >> create mode 100755 testsuite/tests/ip/tunnel/add_tunnel.t >> >> diff --git a/ip/iptunnel.c b/ip/iptunnel.c >> index 78fa988..7826a37 100644 >> --- a/ip/iptunnel.c >> +++ b/ip/iptunnel.c >> @@ -629,8 +629,34 @@ static int do_6rd(int argc, char **argv) >> return tnl_6rd_ioctl(cmd, medium, &ip6rd); >> } >> >> +static int tunnel_mode_is_ipv6(char *tunnel_mode) { >> + char *ipv6_modes[] = { >> + "ipv6/ipv6", "ip6ip6", >> + "vti6", >> + "ip/ipv6", "ipv4/ipv6", "ipip6", "ip4ip6", >> + "ip6gre", "gre/ipv6", >> + "any/ipv6", "any" >> + }; >> + int i; >> + >> + for (i = 0; i < sizeof(ipv6_modes) / sizeof(char *); i++) { >> + if (strcmp(ipv6_modes[i], tunnel_mode) == 0) >> + return 1; >> + } >> + return 0; >> +} >> + > > The ipv6_modes table should be static const. Thank you for the note! attached the corrected patch. > Also is it possible to use strstr for ipv6 and ip6 or even strchr(tunnel_mode, '6') > to simplify this? There is IPv6 tunnel mode 'any', and IPv4 tunnel mode 'ipv6/ip' (aka 'sit'). It looks to me that attempts to find some substring match would not make the code much shorter, but definitely less readable. Konstantin Shemyak. >From 42d27db0055c3a114fe6eb86d680bef9ec098ad4 Mon Sep 17 00:00:00 2001 From: Konstantin Shemyak <konstantin@shemyak.com> Date: Thu, 12 Nov 2015 20:52:02 +0200 Subject: [PATCH] Tunnel address family is determined from the tunnel mode When the tunnel mode already tells the IP address family, "ip tunnel" command determines it and does not require option "-4"/"-6" to be passed. This makes possible creating both IPv4 and IPv6 tunnels in one batch. Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
-
Daniel Borkmann authored
I've added three examples to examples/bpf/ that demonstrate how one can implement eBPF tail calls in tc with f.e. multiple levels of nesting. That should act as a good starting point, but also as test cases for the ELF loader and kernel. A real test suite for {f,m,e}_bpf is still to be developed in future work. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Since we have all infrastructure in place now, allow atomic live updates on program arrays. This can be very useful e.g. in case programs that are being tail-called need to be replaced, f.e. when classifier functionality needs to be changed, new protocols added/removed during runtime, etc. Thus, provide a way for in-place code updates, minimal example: Given is an object file cls.o that contains the entry point in section 'classifier', has a globally pinned program array 'jmp' with 2 slots and id of 0, and two tail called programs under section '0/0' (prog array key 0) and '0/1' (prog array key 1), the section encoding for the loader is <id/key>. Adding the filter loads everything into cls_bpf: tc filter add dev foo parent ffff: bpf da obj cls.o Now, the program under section '0/1' needs to be replaced with an updated version that resides in the same section (also full path to tc's subfolder of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp): tc exec bpf graft m:globals/jmp obj cls.o sec 0/1 In case the program resides under a different section 'foo', it can also be injected into the program array like: tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo If the new tail called classifier program is already available as a pinned object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected into the prog array like: tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser In the kernel, the program on key 1 is being atomically replaced and the old one's refcount dropped. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
The recently introduced object pinning can be further extended in order to allow sharing maps beyond tc namespace. F.e. maps that are being pinned from tracing side, can be accessed through this facility as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Make use of the new show_fdinfo() facility and verify that when a pinned map is being fetched that its basic attributes are the same as the map we declared from the ELF file. I.e. when placed into the globalns, collisions could occur. In such a case warn the user and bail out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Now that we have the possibility of sharing maps, it's time we get the ELF loader fully working with regards to tail calls. Since program array maps are pinned, we can keep them finally alive. I've noticed two bugs that are being fixed in bpf_fill_prog_arrays() with this patch. Example code comes as follow-up. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
-
Stephen Hemminger authored
-
Tom Herbert authored
This patch adds support to remote checksum checksum offload to VXLAN. This patch adds remcsumtx and remcsumrx to ip vxlan configuration to enable remote checksum offload for transmit and receive on the VXLAN tunnel. https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 Example: ip link add name vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 \ udpcsum remcsumtx remcsumrx Testing: Ran single netperf over mlnx4 to illustrate the effest: - Without RCO (UDP csum set to zero) 4335.99 Mbps - With RCO enabled 7661.81 Mbps Signed-off-by: Tom Herbert <tom@herbertland.com>
-
Phil Sutter authored
fgets() will read at most size-1 bytes into the buffer and add a terminating null-char at the end. Therefore it is not necessary to pass a reduced buffer size when calling it. This change was generated using the following semantic patch: @@ identifier buf, fp; @@ - fgets(buf, sizeof(buf) - 1, fp) + fgets(buf, sizeof(buf), fp) Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Although not fundamentally necessary to check return codes in these spots, preventing the warnings will put new ones into focus. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
No need to keep static port boundaries global, they are not used directly. Keeping them local also allows to safely reduce their names to the minimum. Assign hardcoded fallback values also if fscanf() fails. Get rid of unnecessary braces around return parameter. Instead of more or less duplicating is_ephemeral() in run_ssfilter(), simply call the function instead. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Exit early or continue on error instead of putting conditional into conditional to make reading the code a bit easier. Also, the call to memcpy() can be skipped by initialising prog with the desired prefix. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Instead of calling rewind() and fgets() before every call to scan_lines(), move them into scan_lines() itself. This should also fix compat mode, as before the second call to scan_lines() the first line was skipped unconditionally. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
- Replace commas at end of subsection with dots. - Replace double whitespace by single one. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Technically, the range of possible hoplimit values are defined by IPv4 and IPv6 header formats. Both define the field to be eight bits in size, which leads to a value range of [0;255]. Setting a packet's hoplimit field to 0 though makes not much sense, as the next hop would immediately drop the packet. Therefore Linux uses 0 as a special value indicating to use the system's default hoplimit (configurable via sysctl). In iproute, setting the hoplimit of a route to 0 is equivalent to omitting the hoplimit parameter alltogether, so it is actually not necessary to allow that value to be specified, but keep it anyway for backwards compatibility. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Since it uses only a single filter, rtnl_dump_filter() can be used. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Right after ipaddr_reset_filter(), filter.family is always AF_UNSPEC. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Linux version 3.1 introduced a consistency check for netlink dumps in commit 670dc28 ("netlink: advertise incomplete dumps"). This bites iproute2 when flushing more addresses than can fit into a single RTM_GETADDR response. To silence the spurious error message "Dump was interrupted and may be inconsistent.", advise rtnl_dump_filter_l() to not care about NLM_F_DUMP_INTR. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Allow for a filter to ignore certain nlmsg_flags. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Phil Sutter authored
Since it's no longer relevant whether an IP address is primary or secondary when flushing, ipaddr_flush() can be simplified a bit. Signed-off-by: Phil Sutter <phil@nwl.cc>
-
Stephen Hemminger authored
Cleanup all checkpatch complaints about whitespace in rt_names.
-
David Ahern authored
Add support for reading table id/name mappings from rt_tables.d directory. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
-
- 24 Nov, 2015 3 commits
-
-
John W. Linville authored
Signed-off-by: John W. Linville <linville@tuxdriver.com>
-
John W. Linville authored
Signed-off-by: John W. Linville <linville@tuxdriver.com>
-
Daniel Borkmann authored
This larger work addresses one of the bigger remaining issues on tc's eBPF frontend, that is, to allow for persistent file descriptors. Whenever tc parses the ELF object, extracts and loads maps into the kernel, these file descriptors will be out of reach after the tc instance exits. Meaning, for simple (unnested) programs which contain one or multiple maps, the kernel holds a reference, and they will live on inside the kernel until the program holding them is unloaded, but they will be out of reach for user space, even worse with (also multiple nested) tail calls. For this issue, we introduced the concept of an agent that can receive the set of file descriptors from the tc instance creating them, in order to be able to further inspect/update map data for a specific use case. However, while that is more tied towards specific applications, it still doesn't easily allow for sharing maps accross multiple tc instances and would require a daemon to be running in the background. F.e. when a map should be shared by two eBPF programs, one attached to ingress, one to egress, this currently doesn't work with the tc frontend. This work solves exactly that, i.e. if requested, maps can now be _arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within a single object (but various program sections, PIN_OBJECT_NS) without "loosing" the file descriptor set. To make that happen, we use eBPF object pinning introduced in kernel commit b2197755b263 ("bpf: add support for persistent maps/progs") for exactly this purpose. The shipped examples/bpf/bpf_shared.c code from this patch can be easily applied, for instance, as: - classifier-classifier shared: tc filter add dev foo parent 1: bpf obj shared.o sec egress tc filter add dev foo parent ffff: bpf obj shared.o sec ingress - classifier-action shared (here: late binding to a dummy classifier): tc actions add action bpf obj shared.o sec egress pass index 42 tc filter add dev foo parent ffff: bpf obj shared.o sec ingress tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \ action bpf index 42 The toy example increments a shared counter on egress and dumps its value on ingress (if no sharing (PIN_NONE) would have been chosen, map value is 0, of course, due to the two map instances being created): [...] <idle>-0 [002] ..s. 38264.788234: : map val: 4 <idle>-0 [002] ..s. 38264.788919: : map val: 4 <idle>-0 [002] ..s. 38264.789599: : map val: 5 [...] ... thus if both sections reference the pinned map(s) in question, tc will take care of fetching the appropriate file descriptor. The patch has been tested extensively on both, classifier and action sides. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 23 Nov, 2015 2 commits
-
-
Neil Horman authored
I found recently that, if I disabled address promotion in the kernel, that ip addr flush dev <dev> would fail with an EADDRNOTAVAIL errno (though the flush operation would in fact flush all addresses from an interface properly) Whats happening is that, if I add a primary and multiple secondary addresses to an interface, the flush operation first ennumerates them all with a GETADDR | DUMP operation, then sends a delete request for each address. But the kernel, having promotion disabled, deletes all secondary addresses when the primary is removed. That means, that several delete requests may still be pending in the netlink request for addresses that have been removed on our behalf, resulting in EADDRNOTAVAIL return codes. It seems the simplest thing to do is to understand that EADDRUNAVAIL isn't a fatal outcome on a flush operation, as it just indicates that an address which you want to remove is already removed, so it can safely be ignored. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
-
Phil Sutter authored
Despite commit 45a82e5 ("iproute vxlan add support for fdb replace command"), the 'fdb replace' command was not mentioned in bridge.8. Signed-off-by: Phil Sutter <phil@nwl.cc>
-