1. 23 Jul, 2018 4 commits
    • David S. Miller's avatar
      Merge branch 'net-bridge-add-support-for-backup-port' · f8b2990f
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: add support for backup port
      
      This set introduces a new bridge port option that allows any port to have
      any other port (in the same bridge of course) as its backup and traffic
      will be forwarded to the backup port when the primary goes down. This is
      mainly used in MLAG and EVPN setups where we have peerlink path which is
      a backup of many (or even all) ports and is a participating bridge port
      itself. There's more detailed information in patch 02. Patch 01 just
      prepares the port sysfs code for options that take raw value. The main
      issues that this set solves are scalability and fallback latency.
      
      We have used similar code for over 6 months now to bring the fallback
      latency of the backup peerlink down and avoid fdb notification storms.
      Also due to the nature of master devices such setup is currently not
      possible, and last but not least having tens of thousands of fdbs require
      thousands of calls to switch.
      
      I've also CCed our MLAG experts that have been using similar option.
      
      Roopa also adds:
      
      "Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      
      switch-peerlink becomes the failover/backport port when say team0 to
      the server goes down.
      Today, when team0 goes down, control plane has to withdraw all the fdb
      entries pointing to team0
      and re-install the fdb entries pointing to switch-peerlink...and
      restore the fdb entries when team0 comes back up again.
      and  this is the problem we are trying to solve.
      
      This also becomes necessary when multihoming is implemented by a
      standard like E-VPN https://tools.ietf.org/html/rfc8365#section-8
      where the 'switch-peerlink' is an overlay vxlan port (like nikolay
      mentions in his patch commit). In these implementations, the fdb scale
      can be much larger.
      
      On why bond failover cannot be used here ?: the point that nikolay was
      alluding to is, switch-peerlink in the above example is a bridge port
      and is a failover/backport port for more than one or all ports in the
      bridge br0. And you cannot enslave switch-peerlink into a second level
      team
      with other bridge ports. Hence a multi layered team device is not an
      option (FWIW, switch-peerlink is also a teamed interface to the peer
      switch)."
      
      v3: Added Roopa's explanation and diagram
      v2: In patch 01 use kstrdup/kfree to avoid casting the const buf. In order
      to avoid using GFP_ATOMIC or always allocating I kept the spinlock inside
      each branch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8b2990f
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov authored
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2756f68c
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for raw sysfs port options · a5f3ea54
      Nikolay Aleksandrov authored
      This patch adds a new alternative store callback for port sysfs options
      which takes a raw value (buf) and can use it directly. It is needed for the
      backup port sysfs support since we have to pass the device by its name.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5f3ea54
    • YueHaibing's avatar
      net: mediatek: use dma_zalloc_coherent instead of allocator/memset · 0a78c380
      YueHaibing authored
      Use dma_zalloc_coherent instead of dma_alloc_coherent
      followed by memset 0.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a78c380
  2. 22 Jul, 2018 18 commits
  3. 21 Jul, 2018 18 commits