- 21 Jul, 2015 40 commits
-
-
David S. Miller authored
Thomas Graf says: ==================== Lightweight & flow based encapsulation This series combines the work previously posted by Roopa, Robert and myself. It's according to what we discussed at NFWS. The motivation of this series is to: * Consolidate code between OVS and the rest of the kernel and get rid of OVS vports and instead represent them as pure net_devices. * Introduce a lightweight tunneling mechanism which enables flow based encapsulation to improve scalability on both RX and TX. * Do the above in an encapsulation unspecific way so that the encapsulation type is eventually abstracted away from the user. * Use the same forwarding decision for both native forwarding and encapsulation thus allowing to switch between native IPv6 and UDP encapsulation based on endpoint without requiring additional logic The fundamental changes introduces in this series are: * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation instructions. Depending on the specified type, the instructions apply to UDP encapsulations, MPLS and possible other in the future. * Depending on the encapsulation type, the output function of the dst is directly overwritten or the dst merely attaches metadata and relies on a subsequent net_device to apply it to the packet. The latter is typically used if an inner and outer IP header exist which require two subsequent routing lookups to be performed. * A new metadata_dst structure which can be attached to skbs to carry metadata in between subsystems. This new metadata transport is used to provide a single interface for VXLAN, routing and OVS to communicate through metadata. The OVS interfaces remain as-is but will transparently create a real VXLAN net_device in the background. iproute2 is extended with a new use cases: VXLAN: ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0 MPLS: ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1 Performance implications: The additional memory allocation in the receive path should have performance implications although it is not observable in standard throughput tests if GRO is properly done. The correct net_device model outweights the additional cost of the allocation. Furthermore, this implication can be relaxed by reintroducing a direct unqueued path from a software device to a consumer like bridge or OVS if needed. $ netperf -t TCP_STREAM -H 15.1.1.201 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 15.1.1.201 (15.1.1.201) port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 9118.17 Changes since v1: * Properly initialize tun_id as reported by Julian * Drop dupliate netif_keep_dst() as reported by Alexei ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
This gets rid of all OVS specific VXLAN code in the receive and transmit path by using a VXLAN net_device to represent the vport. Only a small shim layer remains which takes care of handling the VXLAN specific OVS Netlink configuration. Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb() since they are no longer needed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
This allows to get rid of the get_name() vport ops later on. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
This is the first step in representing all OVS vports as regular struct net_devices. Move the net_device pointer into the vport structure itself to get rid of struct vport_netdev. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
Utilize the new metadata dst to attach encapsulation instructions to the skb. The existing egress_tun_info via the OVS_CB() is left in place until all tunnel vports have been converted to the new method. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
This factors out the device configuration out of the RTNL newlink API which allows for in-kernel creation of VXLAN net_devices. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
This add the ability to select a routing table based on the tunnel id which allows to maintain separate routing tables for each virtual tunnel network. ip rule add from all tunnel-id 100 lookup 100 ip rule add from all tunnel-id 200 lookup 200 A new static key controls the collection of metadata at tunnel level upon demand. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
This introduces a new IP tunnel lightweight tunnel type which allows to specify IP tunnel instructions per route. Only IPv4 is supported at this point. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to allow routes to match on tunnel metadata. For now, the tunnel id is added to flowi_tunnel which allows for routes to be bound to specific virtual tunnels. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
Allows putting a VXLAN device into a new flow-based mode in which skbs with a ip_tunnel_info dst metadata attached will be encapsulated according to the instructions stored in there with the VXLAN device defaults taken into consideration. Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is set, the packet processing will populate a ip_tunnel_info struct for each packet received and attach it to the skb using the new metadata dst. The metadata structure will contain the outer header and tunnel header fields which have been stripped off. Layers further up in the stack such as routing, tc or netfitler can later match on these fields and perform forwarding. It is the responsibility of upper layers to ensure that the flag is set if the metadata is needed. The flag limits the additional cost of metadata collecting based on demand. This prepares the VXLAN device to be steered by the routing and other subsystems which allows to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability. It also allows for OVS to leverage this mode which in turn allows for the removal of the OVS specific VXLAN code. Because the skb is currently scrubed in vxlan_rcv(), the attachment of the new dst metadata is postponed until after scrubing which requires the temporary addition of a new member to vxlan_metadata. This member is removed again in a later commit after the indirect VXLAN receive API has been removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
If output device wants to see the dst, inherit the dst of the original skb and pass it on to generate the ARP request. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
Introduces a new dst_metadata which enables to carry per packet metadata between forwarding and processing elements via the skb->dst pointer. The structure is set up to be a union. Thus, each separate type of metadata requires its own dst instance. If demand arises to carry multiple types of metadata concurrently, metadata dst entries can be made stackable. The metadata dst entry is refcnt'ed as expected for now but a non reference counted use is possible if the reference is forced before queueing the skb. In order to allow allocating dsts with variable length, the existing dst_alloc() is split into a dst_alloc() and dst_init() function. The existing dst_init() function to initialize the subsystem is being renamed to dst_subsys_init() to make it clear what is what. The check before ip_route_input() is changed to ignore metadata dsts and drop the dst inside the routing function thus allowing to interpret metadata in a later commit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
ip_route_input() unconditionally overwrites the dst. Hide the original dst attached to the skb by calling skb_dst_set(skb, NULL) prior to ip_route_input(). Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
Rename the tunnel metadata data structures currently internal to OVS and make them generic for use by all IP tunnels. Both structures are kernel internal and will stay that way. Their members are exposed to user space through individual Netlink attributes by OVS. It will therefore be possible to extend/modify these structures without affecting user ABI. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This implementation uses lwtunnel infrastructure to register hooks for mpls tunnel encaps. It picks cues from iptunnel_encaps infrastructure and previous mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This is similar to ipv4 redirect of dst output to lwtunnel output function for encapsulation and xmit. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
For input routes with tunnel encap state this patch redirects dst output functions to lwtunnel_output which later resolves to the corresponding lwtunnel output function. This has been tested to work with mpls ip tunnels. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This patch introduces lwtunnel_output function to call corresponding lwtunnels output function to xmit the packet. It adds two variants lwtunnel_output and lwtunnel_output6 for ipv4 and ipv6 respectively today. But this is subject to change when lwtstate will reside in dst or dst_metadata (as per upstream discussions). Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This patch adds support in ipv6 fib functions to parse Netlink RTA encap attributes and attach encap state data to rt6_info. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This patch adds support in ipv4 fib functions to parse user provided encap attributes and attach encap state data to fib_nh and rtable. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
Provides infrastructure to parse/dump/store encap information for light weight tunnels like mpls. Encap information for such tunnels is associated with fib routes. This infrastructure is based on previous suggestions from Eric Biederman to follow the xfrm infrastructure. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This patch introduces two new RTA attributes to attach encap data to fib routes. Example iproute2 command to attach mpls encap data to ipv4 routes $ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1 Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Hollants authored
Added the USB IDs 0x413c:0x81b1 for the "Dell Wireless 5809e Gobi(TM) 4G LTE Mobile Broadband Card", a Dell-branded Sierra Wireless EM7305 LTE card in M.2 form factor, used eg. in Dell's Latitude E7540 Notebook series. Signed-off-by: Pieter Hollants <pieter@hollants.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Wilk authored
Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Anish Bhatt says: ==================== cxgb4 DCB updates The following patchset covers changes to work better with the userspace tools cgdcbxd and cgrulesengd and improves firmware support for host-managed mode. Also exports traffic class information that was previously not being exported via dcbnl_ops and unfifies how app selector information is passed to firmware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anish Bhatt authored
Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anish Bhatt authored
Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anish Bhatt authored
Since finally DCB traffic management is still handled by firmware, allow firmware to be fully programmed and queried even in host managed state for the cases where this was previously rejected. Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anish Bhatt authored
This keeps app format passed to firmware the same irrespective of DCBx version in use. Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
Cookie ACK is always received by the association initiator, so fix the comment to avoid confusion. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Marcelo Ricardo Leitner says: ==================== sctp: fix src address selection if using secondary address This series improves the way SCTP chooses its src address so that the choosen one will always belong to the interface being used for output. v1->v2: - split out the refactoring from the fix itself - Doing a full reverse routing as in v1 is not necessary. Only looking for the interface that has the address and comparing its number is enough. ==================== Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
In short, sctp is likely to incorrectly choose src address if socket is bound to secondary addresses. This patch fixes it by adding a new check that checks if such src address belongs to the interface that routing identified as output. This is enough to avoid rp_filter drops on remote peer. Details: Currently, sctp will do a routing attempt without specifying the src address and compare the returned value (preferred source) with the addresses that the socket is bound to. When using secondary addresses, this will not match. Then it will try specifying each of the addresses that the socket is bound to and re-routing, checking if that address is valid as src for that dst. Thing is, this check alone is weak: # ip r l 192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.149 192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.147 # ip a l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0 valid_lft 2160sec preferred_lft 2160sec inet 192.168.122.148/24 scope global secondary eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe15:186a/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1 valid_lft 2162sec preferred_lft 2162sec inet 192.168.100.148/24 scope global secondary eth1 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:feb3:9146/64 scope link valid_lft forever preferred_lft forever 4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fe05:47ee/64 scope link valid_lft forever preferred_lft forever # ip r g 192.168.100.193 from 192.168.122.148 192.168.100.193 from 192.168.122.148 dev eth1 cache Even if you specify an interface: # ip r g 192.168.100.193 from 192.168.122.148 oif eth1 192.168.100.193 from 192.168.122.148 dev eth1 cache Although this would be valid, peers using rp_filter will drop such packets as their src doesn't match the routes for that interface. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
Paves the day for the next patch. Functionality stays untouched. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
__vxlan_find_mac invokes ether_addr_equal on the eth_addr field, which triggers unaligned access messages, so rearrange vxlan_fdb to avoid this in the most non-intrusive way. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
Depending on system speed, the large lookup/insert/delete loops of the testsuite can take a considerable amount of time to complete causing watchdog warnings to appear. Allow other tasks to be scheduled throughout the loops. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shaohui Xie authored
Teranetics TN2020 is compliant with IEEE 802.3an 10 Gigabit. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alexei Starovoitov says: ==================== bpf: introduce bpf_skb_vlan_push/pop() helpers Let TC+eBPF programs call skb_vlan_push/pop via helpers. v1->v2: - reworded commit log to better explain correctness of re-caching and fixed comparison of mixed endiannes (suggested by Eric) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
improve accuracy of timing in test_bpf and add two stress tests: - {skb->data[0], get_smp_processor_id} repeated 2k times - {skb->data[0], vlan_push} x 68 followed by {skb->data[0], vlan_pop} x 68 1st test is useful to test performance of JIT implementation of BPF_LD_ABS together with BPF_CALL instructions. 2nd test is stressing skb_vlan_push/pop logic together with skb->data access via BPF_LD_ABS insn which checks that re-caching of skb->data is done correctly. In order to call bpf_skb_vlan_push() from test_bpf.ko have to add three export_symbol_gpl. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via helper functions. These functions may change skb->data/hlen which are cached by some JITs to improve performance of ld_abs/ld_ind instructions. Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls, re-compute header len and re-cache skb->data/hlen back into cpu registers. Note, skb->data/hlen are not directly accessible from the programs, so any changes to skb->data done either by these helpers or by other TC actions are safe. eBPF JIT supported by three architectures: - arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is. - x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls (experiments showed that conditional re-caching is slower). - s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present in the program (re-caching is tbd). These helpers allow more scalable handling of vlan from the programs. Instead of creating thousands of vlan netdevs on top of eth0 and attaching TC+ingress+bpf to all of them, the program can be attached to eth0 directly and manipulate vlans as necessary. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-