- 01 Mar, 2016 40 commits
-
-
David S. Miller authored
Vivien Didelot says: ==================== net: dsa: mv88e6xxx: implement VLAN filtering This patchset fixes hardware bridging for non 802.1Q aware systems. The mv88e6xxx DSA driver currently depends on CONFIG_VLAN_8021Q and CONFIG_BRIDGE_VLAN_FILTERING enabled for correct bridging between switch ports. Patch 1/9 adds support for the VLAN filtering switchdev attribute in DSA. Patchs 2/9 and 3/9 add helper functions for the following patches. Patchs 4/9 to 6/9 assign dynamic address databases to VLANs, ports, and bridge groups (the lowest available FID is cleared and assigned), and thus restore support for per-port FDB operations. Patchs 7/9 to 9/9 refine ports isolation and setup 802.1Q on user demand. With this patchset, ports get correctly bridged and the driver behaves as expected, with or without 802.1Q support. With CONFIG_VLAN_8021Q enabled, setting a default PVID to the bridge correctly propagates the corresponding VLAN, in addition to the hardware bridging: # echo 42 > /sys/class/net/<bridge>/bridge/default_pvid But considering CONFIG_BRIDGE_VLAN_FILTERING enabled, the hardware VLAN filtering is enabled on all bridge members only when the user requests it: # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Implement port_vlan_filtering in the driver to toggle the related port 802.1Q mode between DISABLED and SECURE, on user request. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Now that ports isolation is correctly configured when joining or leaving a bridge, there is no need to rely on reserved VLANs to isolate unbridged ports anymore. Thus remove them, and disable 802.1Q on setup. This restores the expected behavior of hardware bridging for systems without 802.1Q or VLAN filtering enabled. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
The In Chip Port Based VLAN Table contains bits used to restrict which output ports this input port can send frames to. With the VLAN filtering enabled, these tables work in conjunction with the VLAN Table Unit to allow egressing frames. In order to remove the current dependency to BRIDGE_VLAN_FILTERING for basic hardware bridging to work, it is necessary to restore a fine control of each port's VLANTable, on setup and when a port joins or leaves a bridge. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Give a new bridge a fresh FDB, assign it to its members, and restore a fresh FDB to a port leaving a bridge. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Restore per-port FDB. Assign them on setup, allow adding and deleting addresses into them, and dump them. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Add a _mv88e6xxx_fid_new function which gives and flushes the lowest FID available. Call it when preparing a new VTU entry. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Move out the code which dumps a single FDB to its own function. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Rename _mv88e6xxx_vlan_init in _mv88e6xxx_vtu_new, eventually called from a new _mv88e6xxx_vtu_get function, which abstracts the VTU GetNext VID-1 trick to retrieve a single entry. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
When a user explicitly requests VLAN filtering with something like: # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering Switchdev propagates a SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port attribute. Add support for it in the DSA layer with a new port_vlan_filtering function to let drivers toggle 802.1Q filtering on user demand. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== Introduce devlink interface and first drivers to use it There a is need for some userspace API that would allow to expose things that are not directly related to any device class like net_device of ib_device, but rather chip-wide/switch-ASIC-wide stuff. Use cases: 1) get/set of port type (Ethernet/InfiniBand) 2) setting up port splitters - split port into multiple ones and squash again, enables usage of splitter cable 3) setting up shared buffers - shared among multiple ports within one chip (work in progress) 4) configuration of switch wide properties - resources division etc - This will allow to pass configuration that is unacceptable to be passed as a module option. First patch of this set introduces a new generic Netlink based interface, called "devlink". It is similar to nl80211 model and it is heavily influenced by it, including the API definition. The devlink introduction patch implements use cases 1) and 2). Other 2 are in development atm and will be addressed by follow-ups. It is very convenient for drivers to use devlink, as you can see in other patches in this set. Counterpart for devlink is userspace tool for now called "dl". Command line interface and outputs are derived from "ip" tool so it should be easy for users to get used to it. It is available here as a standalone tool for now: https://github.com/jpirko/devlink After this is merge in kernel, I will include the "dl" or "devlink" tool into iproute2 toolset. Port type setting example: myhost:~$ dl help Usage: dl [ OPTIONS ] OBJECT { COMMAND | help } where OBJECT := { dev | port | monitor } OPTIONS := { -v/--verbose } myhost:~$ dl dev help Usage: dl dev show [DEV] myhost:~$ dl dev show pci/0000:01:00.0 myhost:~$ dl port help Usage: dl port show [DEV/PORT_INDEX] Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ] Usage: dl port split DEV/PORT_INDEX count Usage: dl port unsplit DEV/PORT_INDEX myhost:~$ dl port show pci/0000:01:00.0/1: type ib ibdev mlx4_0 pci/0000:01:00.0/2: type ib ibdev mlx4_0 myhost:~$ sudo dl port set pci/0000:01:00.0/1 type eth myhost:~$ dl port show pci/0000:01:00.0/1: type eth netdev ens4 pci/0000:01:00.0/2: type ib ibdev mlx4_0 myhost:~$ sudo dl port set ens4 type auto myhost:~$ dl port show pci/0000:01:00.0/1: type eth(auto) netdev ens4 pci/0000:01:00.0/2: type ib ibdev mlx4_0 Port splitting example: myswitch:~$ sudo modprobe mlxsw_pci myswitch:~$ dl port pci/0000:03:00.0/1: type eth netdev eth0 pci/0000:03:00.0/3: type eth netdev eth1 pci/0000:03:00.0/5: type eth netdev eth2 ... pci/0000:03:00.0/63: type eth netdev eth31 myswitch:~$ sudo dl port split pci/0000:03:00.0/1 2 (or "sudo dl port split eth0 2") myswitch:~$ dl port pci/0000:03:00.0/3: type eth netdev eth1 pci/0000:03:00.0/5: type eth netdev eth2 ... pci/0000:03:00.0/63: type eth netdev eth31 pci/0000:03:00.0/1: type eth netdev eth0 split_group 16 pci/0000:03:00.0/2: type eth netdev eth32 split_group 16 myswitch:~$ sudo dl port unsplit pci/0000:03:00.0/1 myswitch:~$ dl port pci/0000:03:00.0/3: type eth netdev eth1 pci/0000:03:00.0/5: type eth netdev eth2 pci/0000:03:00.0/63: type eth netdev eth31 pci/0000:03:00.0/1: type eth netdev eth0 v2->v3: patch 1/9 -removed generated devlink index and name, use bus name and dev name as a handle for all userspace originated commands. Along with that, remove sysfs stub. Requested by Hannes Sowa. patch 2/9 -add dev param to devlink_register (api change) patch 4/9 -add dev param to devlink_register (api change) patch 9/9 -set port's speed according to width fix by Ido v1->v2: patch 1/9 -removed no longer used "devlink_dev" helper -fix couple of typos and misspells patch 4/9: -removed SET_NETDEV_DEV set to devlink dev ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Allow a user to split or unsplit a port using the newly introduced devlink ops. Once split, the original netdev is destroyed and 2 or 4 others are created, according to user configuration. The new ports are like any other port, with the sole difference of supporting a lower maximum speed. When unsplit, the reverse process takes place. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When splitting and unsplitting we'll destroy usable ports on the fly, so mark them using a NULL pointer to indicate that their local port number is free and can be re-used. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The port netdevs are each associated with a different local port number in the device. These local ports are grouped into groups of 4 (e.g. (1-4), (5-8)) called clusters. The cluster constitutes the one of two possible modules they can be mapped to. This mapping is board-specific and done by the device's firmware during init. When splitting a port by 4, the device requires us to first unmap all the ports in the cluster and then map each to a single lane in the module associated with the port netdev used as the handle for the operation. This means that two port netdevs will disappear, as only 100Gb/s (4 lanes) ports can be split and we are guaranteed to have two of these ((1, 3), (5, 7) etc.) in a cluster. When unsplit occurs we need to reinstantiate the two original 100Gb/s ports and map each to its origianl module. Therefore, during driver init store the initial local port to module mapping, so it can be used later during unsplitting. Note that a by 2 split doesn't require us to store the mapping, as we only need to reinstantiate one port whose module is known. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When splitting a port we replace it with 2 or 4 other ports. To be able to do that we need to remove the original port netdev and unmap it from its module. However, we first mark it as disabled, as active ports cannot be unmapped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Add middle layer in mlxsw core code to forward port split/unsplit calls into specific ASIC drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Implement newly introduced devlink interface. Add devlink port instances for every port and set the port types accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
So far, there has been an mlx4-specific sysfs file allowing user to change port type to either Ethernet of InfiniBand. This is very inconvenient. Allow to expose the same ability to set port type in a generic way using devlink interface. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Implement newly introduced devlink interface. Add devlink port instances for every port and set the port types accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> v2->v3: -add dev param to devlink_register (api change) Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Introduce devlink infrastructure for drivers to register and expose to userspace via generic Netlink interface. There are two basic objects defined: devlink - one instance for every "parent device", for example switch ASIC devlink port - one instance for every physical port of the device. This initial portion implements basic get/dump of objects to userspace. Also, port splitter and port type setting is implemented. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
John Fastabend says: ==================== tc software only This adds a software only flag to tc but incorporates a bunch of comments from the original attempt at this. First instead of having the offload decision logic be embedded in cls_u32 I lifted into cls_pkt.h so it can be used anywhere and named the flag TCA_CLS_FLAGS_SKIP_HW (Thanks Jiri ;) In order to do this I put the flag defines in pkt_cls.h as well. However it was suggested that perhaps these flags could be lifted into the upper layer of TCA_ as well but I'm afraid this can not be done with existing tc design as far as I can tell. The problem is the filters are packed and unpacked in the classifier specific code and pushing the flags through the high level doesn't seem easily doable. And we already have this design where classifiers handle generic options such as actions and policers. So I think adding one more thing here is OK as 'tc', et. al. already know how to handle this type of thing. ==================== Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
In the initial implementation the only way to stop a rule from being inserted into the hardware table was via the device feature flag. However this doesn't work well when working on an end host system where packets are expect to hit both the hardware and software datapaths. For example we can imagine a rule that will match an IP address and increment a field. If we install this rule in both hardware and software we may increment the field twice. To date we have only added support for the drop action so we have been able to ignore these cases. But as we extend the action support we will hit this example plus more such cases. Arguably these are not even corner cases in many working systems these cases will be common. To avoid forcing the driver to always abort (i.e. the above example) this patch adds a flag to add a rule in software only. A careful user can use this flag to build software and hardware datapaths that work together. One example we have found particularly useful is to use hardware resources to set the skb->mark on the skb when the match may be expensive to run in software but a mark lookup in a hash table is cheap. The idea here is hardware can do in one lookup what the u32 classifier may need to traverse multiple lists and hash tables to compute. The flag is only passed down on inserts. On deletion to avoid stale references in hardware we always try to remove a rule if it exists. The flags field is part of the classifier specific options. Although it is tempting to lift this into the generic structure doing this proves difficult do to how the tc netlink attributes are implemented along with how the dump/change routines are called. There is also precedence for putting seemingly generic pieces in the specific classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So although not ideal I've left FLAGS in the u32 options as well as it simplifies the code greatly and user space has already learned how to manage these bits ala 'tc' tool. Another thing if trying to update a rule we require the flags to be unchanged. This is to force user space, software u32 and the hardware u32 to keep in sync. Thanks to Simon Horman for catching this case. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
In the original series drivers would get offload requests for cls_u32 rules even if the feature bit is disabled. This meant the driver had to do a boiler plate check on the feature bit before adding/deleting the rule. This patch lifts the check into the core code and removes it from the driver specific case. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
The offload decision was originally very basic and tied to if the dev implemented the appropriate ndo op hook. The next step is to allow the user to more flexibly define if any paticular rule should be offloaded or not. In order to have this logic in one function lift the current check into a helper routine tc_should_offload(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Paolo Abeni says: ==================== bridge/ovs: avoid skb head copy on frame forwarding Currently, while when an OVS or Linux bridge is used to forward frames towards some tunnel device, a skb_head_copy() may occur if the ingress device do not provide enough headroom for the tx encapsulation. This patch series tries to address the issue implementing a new ndo operation to allow the master device to control the headroom used when allocating the skb on frame reception. Said operation is used by the Linux bridge to notify the bridged ports of needed_headroom changes, and similar bookkeeping and behaviour is also added to openvswitch, on a per datapath basis. Finally, the operation is implemented for veth and tun device, which give performance improvement in the 6-12% range when forwarding frames from said devices towards a vxlan tunnel. v2: - fix netdev_get_fwd_headroom() behaviour - remove some code duplication with the netdev_set_rx_headroom() and netdev_reset_rx_headroom() helpers - handle headroom reset on [v]port removal/deletion - initialize tun align to the old default value v3: - fix a comment typo ==================== Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The rx headroom for veth dev is the peer device needed_headroom. Avoid ping-pong updates setting the private flag IFF_PHONY_HEADROOM. This avoids skb head reallocation when forwarding from a veth dev towards a device adding some kind of encapsulation. When transmitting frames below the MTU size towards a vxlan device, this gives about 10% performance speed-up when OVS is used to connect the veth and the vxlan device and a little more when using a plain Linux bridge. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
ndo_set_rx_headroom controls the align value used by tun devices to allocate skbs on frame reception. When the xmit device adds a large encapsulation, this avoids an skb head reallocation on forwarding. The measured improvement when forwarding towards a vxlan dev with frame size below the egress device MTU is as follow: vxlan over ipv6, bridged: +6% vxlan over ipv6, ovs: +7% In case of ipv4 tunnels there is no improvement, since the tun device default alignment provides enough headroom to avoid the skb head reallocation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
This patch implements bookkeeping support to compute the maximum headroom for all the devices in each datapath. When said value changes, the underlying devs are notified via the ndo_set_rx_headroom method. This also increases the internal vports xmit performance. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
On bridge needed_headroom changes, the enslaved devices are notified via the ndo_set_rx_headroom method Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
This method allows the controlling device (i.e. the bridge) to specify additional headroom to be allocated for skb head on frame reception. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: updates for net-next. Miscellaneous updates covering SRIOV, IRQ coalescing, firmware logging and package version for net-next. Thanks. v2: Updated description and added more comments for patch 1. Fixed function parameters formatting for patch 4. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
This is used to send NVM_FIND_DIR_ENTRY messages which can return error if the entry is not found. This is normal and the error message will cause unnecessary alarm, so silence it. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Add a new function bnxt_do_send_msg() to do essentially the same thing with an additional paramter to silence error response messages. All current callers will set silent to false. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rob Swindell authored
For everything to fit, we remove the PHY microcode version and replace it with the firmware package version in the fw_version string. Signed-off-by: Rob Swindell <swindell@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Use appropriate firmware request header structure to prepare the firmware messages. This avoids the unnecessary conversion of the fields to 32-bit fields. Add appropriate endian conversion when printing out the message fields in dmesg so that they appear correct in the log. Reported-by: Rob Swindell <swindell@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Before this patch, we used a hardcoded value of 500 msec as the default value for firmware message response timeout. For better portability with future hardware or debug platforms, use the value provided by firmware in the first response and store it for all susequent messages. Redefine the macro HWRM_CMD_TIMEOUT to the stored value. Since we don't have the value yet in the first message, use the 500 ms default if the stored value is zero. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
When tx and rx rings don't share the same completion ring, tx coalescing parameters can be set differently from the rx coalescing parameters. Otherwise, use rx coalescing parameters on shared completion rings. Adjust rx coalescing default values to lower interrupt rate. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Add a function to set all the coalescing parameters. The function can be used later to set both rx and tx coalescing parameters. v2: Fixed function parameters formatting requested by DaveM. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Don't convert these to internal hardware tick values before storing them. This avoids the confusion of ethtool -c returning slightly different values than the ones set using ethtool -C when we convert hardware tick values back to micro seconds. Add better comments for the hardware settings. Also, rename the current set of coalescing fields with rx_ prefix. The next patch will add support of tx coalescing values. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jeffrey Huang authored
During remove_one() when SRIOV is enabled, the PF driver should broadcast PF driver unload notification to all VFs that are attached to VMs. Upon receiving the PF driver unload notification, the VF driver should print a warning message to message log. Certain operations on the VF may not succeed after the PF has unloaded. Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-