- 10 Feb, 2017 40 commits
-
-
tcharding authored
This patch fixes multiple occurrences of checkpatch WARNING: Missing a blank line after declarations. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
tcharding authored
Fix multiple occurrences of checkpatch warning. WARNING: Block comments use * on subsequent lines. Also make comment blocks more uniform. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
tcharding authored
This patch fixes two trivial whitespace errors. Brace should be on the previous line and trailing statements should be on next line. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
tcharding authored
This patch fixes multiple occurrences of space before tabs warnings. More lines of code were moved than required to keep kernel-doc comments uniform. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Amir Vadai says: ==================== net/sched: act_pedit: Use offset relative to conventional network headers Some FW/HW parser APIs are such that they need to get the specific header type (e.g IPV4 or IPV6, TCP or UDP) and not only the networking level (e.g network or transport). Enhancing the UAPI to allow for specifying that, would allow the same flows to be set into both SW and HW. This patchset also makes pedit more robust. Currently fields offset is specified by offset relative to the ip header, while using negative offsets for MAC layer fields. This series enables the user to set offset relative to the relevant header. Usage example: $ tc filter add dev enp0s9 protocol ip parent ffff: \ flower \ ip_proto tcp \ dst_port 80 \ action \ pedit munge ip ttl add 0xff \ pedit munge tcp dport set 8080 \ pipe action mirred egress redirect dev veth0 Will forward traffic destined to tcp dport 80, while modifying the destination port to 8080, and decreasing the ttl by one. I've uploaded a draft for the userspace [2] to make it easier to review and test the patchset. [1] - http://patchwork.ozlabs.org/patch/700909/ [2] - git: https://bitbucket.org/av42/iproute2.git branch: pedit Patchset was tested and applied on top of upstream commit bd092ad1 ("Merge branch 'remove-__napi_complete_done'") Thanks, Amir Changes since V2: - Instead of reusing unused bits in existing uapi fields, using new netlink attributes for the new information. This way new/old user space and new/old kernel can live together without having misunderstandings. Changes since V1: - No changes - V1 was sent and didn't make it for 4.10. - You asked me [1] why did I use specific header names instead of layers (L2, L3...), and I explained that it is on purpose, this extra information is planned to be used by hardware drivers to offload the action. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amir Vadai authored
This command could be useful to inc/dec fields. For example, to forward any TCP packet and decrease its TTL: $ tc filter add dev enp0s9 protocol ip parent ffff: \ flower ip_proto tcp \ action pedit munge ip ttl add 0xff pipe \ action mirred egress redirect dev veth0 In the example above, adding 0xff to this u8 field is actually decreasing it by one, since the operation is masked. Signed-off-by: Amir Vadai <amir@vadai.me> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amir Vadai authored
Extend pedit to enable the user setting offset relative to network headers. This change would enable to work with more complex header schemes (vs the simple IPv4 case) where setting a fixed offset relative to the network header is not enough. After this patch, the action has information about the exact header type and field inside this header. This information could be used later on for hardware offloading of pedit. Backward compatibility was being kept: 1. Old kernel <-> new userspace 2. New kernel <-> old userspace 3. add rule using new userspace <-> dump using old userspace 4. add rule using old userspace <-> dump using new userspace When using the extended api, new netlink attributes are being used. This way, operation will fail in (1) and (3) - and no malformed rule be added or dumped. Of course, new user space that doesn't need the new functionality can use the old netlink attributes and operation will succeed. Since action can support both api's, (2) should work, and it is easy to write the new user space to have (4) work. The action is having a strict check that only header types and commands it can handle are accepted. This way future additions will be much easier. Usage example: $ tc filter add dev enp0s9 protocol ip parent ffff: \ flower \ ip_proto tcp \ dst_port 80 \ action pedit munge tcp dport set 8080 pipe \ action mirred egress redirect dev veth0 Will forward tcp port whose original dest port is 80, while modifying the destination port to 8080. Signed-off-by: Amir Vadai <amir@vadai.me> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amir Vadai authored
Introduce skb_mac_offset() that could be used to get mac header offset. Signed-off-by: Amir Vadai <amir@vadai.me> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Offload MC flood for unregister MC Nogah says: When multicast is enabled, the Linux bridge floods unregistered multicast packets only to ports connected to a multicast router. Devices capable of offloading the Linux bridge need to be made aware of such ports, for proper flooding behavior. On the other hand, when multicast is disabled, such packets should be flooded to all ports. This patchset aims to fix that, by offloading the multicast state and the list of multicast router ports. The first 3 patches adds switchdev attributes to offload this data. The rest of the patchset add implementation for handling this data in the mlxsw driver. The effects this data has on the MDB (namely, when the multicast is disabled the MDB should be considered as invalid, and when it is enabled, a packet that is flooded by it should also be flooded to the multicast routers ports) is subject of future work. Testing of this patchset included: Sending 3 mc packets streams, LL, register and unregistered, and checking that they reached only to the ports that should have received them. The configs were: mc disabled, mc without mc router ports and mc with fixed router port. It was checked for vlan aware bridge, vlan unaware bridge and vlan unaware bridge with another vlan unaware bridge on the same machine ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Add a function to update mc_disabled from switchdev attr SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
The function mlxsw_sp_port_orig_get returns the vport from the physical port if needed, based on the original device. This patch addresses the case where the original device is a bridge. If it is vlan unaware bridge, it returns the matching vport. If it is vlan aware bridge, there is no matching vport, and it returns the original port. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
The decision whether to flood a multicast packet to a port dependent on three flags: mc_disabled, mc_router_port, mc_flood. If mc_disabled is on, the port will be flooded according to mc_flood, otherwise, according to mc_router_port. To accomplish that, add those flags into the mlxsw_sp_port struct and update the mc flood table accordingly. Update mc_router_port by switchdev attribute SWITCHDEV_ATTR_ID_PORT_MC_ROUTER_PORT. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Break the bm (broadcast-multicast) into two tables, one for broadcast (and link local multicast that behaves like bc) and one for unknown multicasts. Add a bool into mlxsw_sp_port named mc_flood that reflect the value this port should have in the mc flood table (currently, always 1); Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
A user that wants many bridges will use 1.Q bridge which are scalable. One can have as many 1.Q bridges as vfids. This patch sets their number to 1k, which is a reasonably large number. This change is done here because the next patches will add a new flood table, and without it, it will increase the overall size of the flood tables dramatically. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Currently, there is a per port flood update function only for the UC table. Make the function more generic by changing the table type to be an input. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Currently, the flood set function can't operate on only one table, but sets both uc_flood and mb_flood together. This patch creates a function that sets the flood state per table. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Offload the mc router ports list, whenever it is being changed. It is done because in some cases mc packets needs to be flooded to all the ports in this list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
There are three places where a port gets deleted from the mc router port list. This patch join the actual deletion to one function. It will be helpful for later patch that will offload changes in the mc router ports list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Offload multicast disabled flag, for more accurate mc flood behavior: When it is on, the mdb should be ignored. When it is off, unregistered mc packets should be flooded to mc router ports. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== sched: cls_api: small cleanup This patchset makes couple of things in cls_api code a bit nicer and easier for reader to digest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
As it is more common, check err for !0. That allows to safe one level of indentation and makes the code easier to read. Also, make 'next' variable global in function as it is used twice. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Curly braces need to be there, for stylistic reasons. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
This makes the reader to know right away what is the error value. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Make the long function tc_ctl_tfilter a little bit shorter and easier to read. Also make the creation of filter proto symmetric to destruction. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Creation is done in this file, move destruction to be at the same place. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
This function destroys TC filter protocol, not TC filter. So name it accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Identical routes handling Ido says: The kernel can store several FIB aliases that share the same prefix and length. These aliases can differ in other parameters such as TOS and metric, which are taken into account during lookup. Offloading devices might not have the same flexibility, allowing only a single route with the same prefix and length to be reflected. mlxsw is one such device. This patchset aims to correctly handle this situation in the mlxsw driver. The first four patches introduce small changes in the IPv4 FIB code, so that listeners of the FIB notification chain will be able to correctly handle identical routes. The last three patches build on top of previous work and introduce the necessary changes in the mlxsw driver. The biggest change is the introduction of a FIB node, where identical routes are chained, instead of a primitive reference counting. This is explained in detail in the fifth patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Upon the reception of an ENTRY_REPLACE notification, resolve the FIB node corresponding to the prefix and length and insert the new route before the first matching entry. Since the notification also signals the deletion of the replaced route, delete it from the driver's cache. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When a new route is appended, it's placed after existing routes sharing the same parameters (prefix, length, table ID, TOS and priority). While the device supports only one route with the same prefix and length in a single table, it's important to correctly place the appended route in the driver's cache, as when a route is deleted the next one is programmed into the device. Following the reception of an ENTRY_APPEND notification, resolve the FIB node corresponding to the prefix and length and correctly place the new entry in its entry list. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
In the device, routes are indexed in a routing table based on the prefix and its length. This is in contrast to the kernel's FIB where several FIB aliases can exist with these parameters being identical. In such cases, the routes will be sorted by table ID (LOCAL first, then MAIN), TOS and finally priority (metric). During lookup, these routes will be evaluated in order. In case the packet's TOS field is non-zero and a FIB alias with a matching TOS is found, then it's selected. Otherwise, the lookup defaults to the route with TOS 0 (if it exists). However, if the requested scope is narrower than the one found, then the lookup continues. To best reflect the kernel's datapath we should take the above into account. Given a prefix and its length, the reflected route will always be the first one in the FIB alias list. However, if the route has a non-zero TOS then its action will be converted to trap instead of forward, since we currently don't support TOS-based routing. If this turns out to be a real issue, we can add support for that using policy-based switching. The route's scope can be effectively ignored as any packet being routed by the device would've been looked-up using the widest scope (UNIVERSE). To achieve that we need to do two changes. Firstly, we need to create another struct (FIB node) that will hold the list of FIB entries sharing the same prefix and length. This struct will be hashed using these two parameters. Secondly, we need to change the route reflection to match the above logic, so that the first FIB entry in the list will be programmed into the device while the rest will remain in the driver's cache in case of subsequent changes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND} flags to signal routes being replaced or appended. Instead of using netlink flags for in-kernel notifications we can simply introduce two new events in the FIB notification chain. This has the added advantage of making the API cleaner, thereby making it clear that these events should be supported by listeners of the notification chain. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD notification is sent after the reference on the previous FIB info was dropped. This is problematic as potential listeners might need to access it in their notification blocks. Solve this by sending the notification prior to the deletion of the replaced FIB alias. This is consistent with ENTRY_DEL notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When a FIB alias is removed, a notification is sent using the type passed from user space - can be RTN_UNSPEC - instead of the actual type of the removed alias. This is problematic for listeners of the FIB notification chain, as several FIB aliases can exist with matching parameters, but the type. Solve this by passing the actual type of the removed FIB alias. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
In case the MAIN table is flushed and its trie is shared with the LOCAL table, then we might be flushing FIB aliases belonging to the latter. This can lead to FIB_ENTRY_DEL notifications sent with the wrong table ID. The above doesn't affect current listeners, as the table ID is ignored during entry deletion, but this will change later in the patchset. When flushing a particular table, skip any aliases belonging to a different one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Patrick McHardy <kaber@trash.net> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
-
Jarno Rajahalme authored
struct sw_flow_key has two 16-bit holes. Move the most matched conntrack match fields there. In some typical cases this reduces the size of the key that needs to be hashed into half and into one cache line. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jarno Rajahalme authored
Stateful network admission policy may allow connections to one direction and reject connections initiated in the other direction. After policy change it is possible that for a new connection an overlapping conntrack entry already exists, where the original direction of the existing connection is opposed to the new connection's initial packet. Most importantly, conntrack state relating to the current packet gets the "reply" designation based on whether the original direction tuple or the reply direction tuple matched. If this "directionality" is wrong w.r.t. to the stateful network admission policy it may happen that packets in neither direction are correctly admitted. This patch adds a new "force commit" option to the OVS conntrack action that checks the original direction of an existing conntrack entry. If that direction is opposed to the current packet, the existing conntrack entry is deleted and a new one is subsequently created in the correct direction. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jarno Rajahalme authored
Add the fields of the conntrack original direction 5-tuple to struct sw_flow_key. The new fields are initially marked as non-existent, and are populated whenever a conntrack action is executed and either finds or generates a conntrack entry. This means that these fields exist for all packets that were not rejected by conntrack as untrackable. The original tuple fields in the sw_flow_key are filled from the original direction tuple of the conntrack entry relating to the current packet, or from the original direction tuple of the master conntrack entry, if the current conntrack entry has a master. Generally, expected connections of connections having an assigned helper (e.g., FTP), have a master conntrack entry. The main purpose of the new conntrack original tuple fields is to allow matching on them for policy decision purposes, with the premise that the admissibility of tracked connections reply packets (as well as original direction packets), and both direction packets of any related connections may be based on ACL rules applying to the master connection's original direction 5-tuple. This also makes it easier to make policy decisions when the actual packet headers might have been transformed by NAT, as the original direction 5-tuple represents the packet headers before any such transformation. When using the original direction 5-tuple the admissibility of return and/or related packets need not be based on the mere existence of a conntrack entry, allowing separation of admission policy from the established conntrack state. While existence of a conntrack entry is required for admission of the return or related packets, policy changes can render connections that were initially admitted to be rejected or dropped afterwards. If the admission of the return and related packets was based on mere conntrack state (e.g., connection being in an established state), a policy change that would make the connection rejected or dropped would need to find and delete all conntrack entries affected by such a change. When using the original direction 5-tuple matching the affected conntrack entries can be allowed to time out instead, as the established state of the connection would not need to be the basis for packet admission any more. It should be noted that the directionality of related connections may be the same or different than that of the master connection, and neither the original direction 5-tuple nor the conntrack state bits carry this information. If needed, the directionality of the master connection can be stored in master's conntrack mark or labels, which are automatically inherited by the expected related connections. The fact that neither ARP nor ND packets are trackable by conntrack allows mutual exclusion between ARP/ND and the new conntrack original tuple fields. Hence, the IP addresses are overlaid in union with ARP and ND fields. This allows the sw_flow_key to not grow much due to this patch, but it also means that we must be careful to never use the new key fields with ARP or ND packets. ARP is easy to distinguish and keep mutually exclusive based on the ethernet type, but ND being an ICMPv6 protocol requires a bit more attention. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jarno Rajahalme authored
We avoid calling into nf_conntrack_in() for expected connections, as that would remove the expectation that we want to stick around until we are ready to commit the connection. Instead, we do a lookup in the expectation table directly. However, after a successful expectation lookup we have set the flow key label field from the master connection, whereas nf_conntrack_in() does not do this. This leads to master's labels being inherited after an expectation lookup, but those labels not being inherited after the corresponding conntrack action with a commit flag. This patch resolves the problem by changing the commit code path to also inherit the master's labels to the expected connection. Resolving this conflict in favor of inheriting the labels allows more information be passed from the master connection to related connections, which would otherwise be much harder if the 32 bits in the connmark are not enough. Labels can still be set explicitly, so this change only affects the default values of the labels in presense of a master connection. Fixes: 7f8a436e ("openvswitch: Add conntrack action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jarno Rajahalme authored
Refactoring conntrack labels initialization makes changes in later patches easier to review. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-