1. 21 Jul, 2015 27 commits
    • Joachim Eastwood's avatar
      stmmac: export probe_config_dt() and get_platform_resources() · 402dae0b
      Joachim Eastwood authored
      Export stmmac_probe_config_dt() and stmmac_get_platform_resources()
      so they can be used in the dwmac-* drivers themselves. This will
      allow us to build more flexible and standalone drivers which just
      use stmmac_platform as a library for setup functions.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      402dae0b
    • Joachim Eastwood's avatar
      stmmac: make stmmac_probe_config_dt return the platform data struct · b0003ead
      Joachim Eastwood authored
      Since stmmac_probe_config_dt() allocates the platform data structure
      it is cleaner if it just returned this structure directly. This
      function will later be used in the probe function in dwmac-* drivers.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0003ead
    • Joachim Eastwood's avatar
      stmmac: introduce stmmac_get_platform_resources() · f396cb01
      Joachim Eastwood authored
      Refactor all code that deals with platform resources into it's
      own get function. This function will later be used in the probe
      function in dwmac-* drivers.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f396cb01
    • Joachim Eastwood's avatar
      stmmac: clean up platform/of_match data retrieval · 4ed2d8fc
      Joachim Eastwood authored
      Refactor code to clearly separate probing non-dt versus dt. In the
      non-dt case platform data must be supplied to probe successfully.
      For dt the platform data structure is created and match data is
      copied into it. Note that support for supplying platform data in
      dt from AUXDATA is dropped as no users in mainline does this.
      
      This change will allow dt dwmac-* drivers to call the config_dt()
      function from probe to create the needed platform data struct and
      retrieve common dt properties.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ed2d8fc
    • Joachim Eastwood's avatar
      stmmac: use of_device_get_match_data to retrieve of match data · 0dacf3f6
      Joachim Eastwood authored
      By using of_device_get_match_data() the code that retrieve
      match data can be simplified quite a bit.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dacf3f6
    • David S. Miller's avatar
      Merge branch 'tipc-separate-link-and-aggregation' · 7781e5d1
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: separate link and link aggregation layer
      
      This is the first batch of a longer series that has two main objectives:
      
      o Finer lock granularity during message sending and reception,
        especially regarding usage of the node spinlock.
      
      o Better separation between the link layer implementation and the link
        aggregation layer, represented by node.c::struct tipc_node.
      
      Hopefully these changes also make this part of code somewhat easier
      to comprehend and maintain.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7781e5d1
    • Jon Paul Maloy's avatar
      tipc: reduce locking scope during packet reception · d999297c
      Jon Paul Maloy authored
      We convert packet/message reception according to the same principle
      we have been using for message sending and timeout handling:
      
      We move the function tipc_rcv() to node.c, hence handling the initial
      packet reception at the link aggregation level. The function grabs
      the node lock, selects the receiving link, and accesses it via a new
      call tipc_link_rcv(). This function appends buffers to the input
      queue for delivery upwards, but it may also append outgoing packets
      to the xmit queue, just as we do during regular message sending. The
      latter will happen when buffers are forwarded from the link backlog,
      or when retransmission is requested.
      
      Upon return of this function, and after having released the node lock,
      tipc_rcv() delivers/tranmsits the contents of those queues, but it may
      also perform actions such as link activation or reset, as indicated by
      the return flags from the link.
      
      This reduces the number of cpu cycles spent inside the node spinlock,
      and reduces contention on that lock.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d999297c
    • Jon Paul Maloy's avatar
      tipc: introduce node contact FSM · 1a20cc25
      Jon Paul Maloy authored
      The logics for determining when a node is permitted to establish
      and maintain contact with its peer node becomes non-trivial in the
      presence of multiple parallel links that may come and go independently.
      
      A known failure scenario is that one endpoint registers both its links
      to the peer lost, cleans up it binding table, and prepares for a table
      update once contact is re-establihed, while the other endpoint may
      see its links reset and re-established one by one, hence seeing
      no need to re-synchronize the binding table. To avoid this, a node
      must not allow re-establishing contact until it has confirmation that
      even the peer has lost both links.
      
      Currently, the mechanism for handling this consists of setting and
      resetting two state flags from different locations in the code. This
      solution is hard to understand and maintain. A closer analysis even
      reveals that it is not completely safe.
      
      In this commit we do instead introduce an FSM that keeps track of
      the conditions for when the node can establish and maintain links.
      It has six states and four events, and is strictly based on explicit
      knowledge about the own node's and the peer node's contact states.
      Only events leading to state change are shown as edges in the figure
      below.
      
                                   +--------------+
                                   | SELF_UP/     |
                 +---------------->| PEER_COMING  |-----------------+
          SELF_  |                 +--------------+                 |PEER_
          ESTBL_ |                        |                         |ESTBL_
          CONTACT|      SELF_LOST_CONTACT |                         |CONTACT
                 |                        v                         |
                 |                 +--------------+                 |
                 |      PEER_      | SELF_DOWN/   |     SELF_       |
                 |      LOST_   +--| PEER_LEAVING |<--+ LOST_       v
      +-------------+   CONTACT |  +--------------+   | CONTACT  +-----------+
      | SELF_DOWN/  |<----------+                     +----------| SELF_UP/  |
      | PEER_DOWN   |<----------+                     +----------| PEER_UP   |
      +-------------+   SELF_   |  +--------------+   | PEER_    +-----------+
                 |      LOST_   +--| SELF_LEAVING/|<--+ LOST_       A
                 |      CONTACT    | PEER_DOWN    |     CONTACT     |
                 |                 +--------------+                 |
                 |                         A                        |
          PEER_  |       PEER_LOST_CONTACT |                        |SELF_
          ESTBL_ |                         |                        |ESTBL_
          CONTACT|                 +--------------+                 |CONTACT
                 +---------------->| PEER_UP/     |-----------------+
                                   | SELF_COMING  |
                                   +--------------+
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a20cc25
    • Jon Paul Maloy's avatar
      tipc: move link supervision timer to node level · 8a1577c9
      Jon Paul Maloy authored
      In our effort to move control of the links to the link aggregation
      layer, we move the perodic link supervision timer to struct tipc_node.
      The new timer is shared between all links belonging to the node, thus
      saving resources, while still kicking the FSM on both its pertaining
      links at each expiration.
      
      The current link timer and corresponding functions are removed.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a1577c9
    • Jon Paul Maloy's avatar
      tipc: simplify link timer implementation · 333ef69e
      Jon Paul Maloy authored
      We create a second, simpler, link timer function, tipc_link_timeout().
      The new function  makes use of the new FSM function introduced in the
      previous commit, and just like it, takes a buffer queue as parameter.
      It returns an event bit field and potentially a link protocol packet
      to the caller.
      
      The existing timer function, link_timeout(), is still needed for a
      while, so we redesign it to become a wrapper around the new function.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      333ef69e
    • Jon Paul Maloy's avatar
      tipc: improve link FSM implementation · 6ab30f9c
      Jon Paul Maloy authored
      The link FSM implementation is currently unnecessarily complex.
      It sometimes checks for conditional state outside the FSM data
      before deciding next state, and often performs actions directly
      inside the FSM logics.
      
      In this commit, we create a second, simpler FSM implementation,
      that as far as possible acts only on states and events that it is
      strictly defined for, and postpone any actions until it is finished
      with its decisions. It also returns an event flag field and an a
      buffer queue which may potentially contain a protocol message to
      be sent by the caller.
      
      Unfortunately, we cannot yet make the FSM "clean", in the sense
      that its decisions are only based on FSM state and event, and that
      state changes happen only here. That will have to wait until the
      activate/reset logics has been cleaned up in a future commit.
      
      We also rename the link states as follows:
      
      WORKING_WORKING -> TIPC_LINK_WORKING
      WORKING_UNKNOWN -> TIPC_LINK_PROBING
      RESET_UNKNOWN   -> TIPC_LINK_RESETTING
      RESET_RESET     -> TIPC_LINK_ESTABLISHING
      
      The existing FSM function, link_state_event(), is still needed for
      a while, so we redesign it to make use of the new function.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ab30f9c
    • Jon Paul Maloy's avatar
      tipc: introduce new link protocol msg create function · 426cc2b8
      Jon Paul Maloy authored
      As a preparation for later changes, we introduce a new function
      tipc_link_build_proto_msg(). Instead of actually sending the created
      protocol message, it only creates it and adds it to the head of a
      skb queue provided by the caller.
      
      Since we still need the existing function tipc_link_protocol_xmit()
      for a while, we redesign it to make use of the new function.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      426cc2b8
    • Jon Paul Maloy's avatar
      tipc: clean up definitions and usage of link flags · d3504c34
      Jon Paul Maloy authored
      The status flag LINK_STOPPED is not needed any more, since the
      mechanism for delayed deletion of links has been removed.
      Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
      because we can just as well start the link timer directly from
      inside tipc_link_create().
      
      We eliminate these flags in this commit.
      
      Instead of the above flags, we now introduce three new link modes,
      TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
      indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
      the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
      blocks timer-driven protocol messages to be sent out, and any change
      to the link FSM. Since the modes are mutually exclusive, we convert
      them to state values, and rename the 'flags' field in struct tipc_link
      to 'exec_mode'.
      
      Finally, we move the #defines for link FSM states and events from link.h
      into enums inside the file link.c, which is the real usage scope of
      these definitions.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3504c34
    • Jon Paul Maloy's avatar
      tipc: make media xmit call outside node spinlock context · af9b028e
      Jon Paul Maloy authored
      Currently, message sending is performed through a deep call chain,
      where the node spinlock is grabbed and held during a significant
      part of the transmission time. This is clearly detrimental to
      overall throughput performance; it would be better if we could send
      the message after the spinlock has been released.
      
      In this commit, we do instead let the call revert on the stack after
      the buffer chain has been added to the transmission queue, whereafter
      clones of the buffers are transmitted to the device layer outside the
      spinlock scope.
      
      As a further step in our effort to separate the roles of the node
      and link entities we also move the function tipc_link_xmit() to
      node.c, and rename it to tipc_node_xmit().
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af9b028e
    • Jon Paul Maloy's avatar
      tipc: change sk_buffer handling in tipc_link_xmit() · 22d85c79
      Jon Paul Maloy authored
      When the function tipc_link_xmit() is given a buffer list for
      transmission, it currently consumes the list both when transmission
      is successful and when it fails, except for the special case when
      it encounters link congestion.
      
      This behavior is inconsistent, and needs to be corrected if we want
      to avoid problems in later commits in this series.
      
      In this commit, we change this to let the function consume the list
      only when transmission is successful, and leave the list with the
      sender in all other cases. We also modifiy the socket code so that
      it adapts to this change, i.e., purges the list when a non-congestion
      error code is returned.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22d85c79
    • Jon Paul Maloy's avatar
      tipc: use bearer index when looking up active links · 36e78a46
      Jon Paul Maloy authored
      struct tipc_node currently holds two arrays of link pointers; one,
      indexed by bearer identity, which contains all links irrespective of
      current state, and one two-slot array for the currently active link
      or links. The latter array contains direct pointers into the elements
      of the former. This has the effect that we cannot know the bearer id of
      a link when accessing it via the "active_links[]" array without actually
      dereferencing the pointer, something we want to avoid in some cases.
      
      In this commit, we do instead store the bearer identity in the
      "active_links" array, and use this as an index to find the right element
      in the overall link entry array. This change should be seen as a
      preparation for the later commits in this series.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36e78a46
    • Jon Paul Maloy's avatar
      tipc: move link input queue to tipc_node · d39bbd44
      Jon Paul Maloy authored
      At present, the link input queue and the name distributor receive
      queues are fields aggregated in struct tipc_link. This is a hazard,
      because a link might be deleted while a receiving socket still keeps
      reference to one of the queues.
      
      This commit fixes this bug. However, rather than adding yet another
      reference counter to the critical data path, we move the two queues
      to safe ground inside struct tipc_node, which is already protected, and
      let the link code only handle references to the queues. This is also
      in line with planned later changes in this area.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d39bbd44
    • Jon Paul Maloy's avatar
      tipc: move link creation from neighbor discoverer to node · d3a43b90
      Jon Paul Maloy authored
      As a step towards turning links into node internal entities, we move the
      creation of links from the neighbor discovery logics to the node's link
      control logics.
      
      We also create an additional entry for the link's media address in the
      newly introduced struct tipc_link_entry, since this is where it is
      needed in the upcoming commits. The current copy in struct tipc_link
      is kept for now, but will be removed later.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3a43b90
    • Jon Paul Maloy's avatar
      tipc: introduce link entry structure to struct tipc_node · 9d13ec65
      Jon Paul Maloy authored
      struct 'tipc_node' currently contains two arrays for link attributes,
      one for the link pointers, and one for the usable link MTUs.
      
      We now group those into a new struct 'tipc_link_entry', and intoduce
      one single array consisting of such enties. Apart from being a cosmetic
      improvement, this is a starting point for the strict master-slave
      relation between node and link that we will introduce in the following
      commits.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d13ec65
    • Jiri Benc's avatar
      net: remove skb_frag_add_head · 6acc2326
      Jiri Benc authored
      It's not used anywhere.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6acc2326
    • David S. Miller's avatar
      Merge branch 'offload_fwd_mark' · bd265242
      David S. Miller authored
      Scott Feldman says:
      
      ====================
      switchdev: avoid duplicate packet forwarding
      
      v3:
      
       - Per Nicolas Dichtel review: remove errant empty union.
      
      v2:
      
       - Per davem review: in sk_buff, union fwd_mark with secmark to save space
         since features appear to be mutually exclusive.
       - Per Simon Horman review:
         - fix grammar in switchdev.txt wrt fwd_mark
         - remove some unrelated changes that snuck in
      
      v1:
      
      This patchset was previously submitted as RFC.  No changes from the last
      version (v2) sent under RFC.  Including RFC version history here for reference.
      
      RFC v2:
      
       - s/fwd_mark/offload_fwd_mark
       - use consume_skb rather than kfree_skb when dropping pkt on egress.
       - Use Jiri's suggestion to use ifindex of one of the ports in a group
         as the mark for all the ports in the group.  This can be done with
         no additional storage (no hashtable from v1).  To pull it off, we
         need some simple recursive routines to walk the netdev tree ensuring
         all leaves in the tree (ports) in the same group (e.g. bridge)
         belonging to the same switch device will have the same offload fwd mark.
         Maybe someone sees a better design for the recusive routines?  They're
         not too bad, and should cover the stacked driver cases.
      
      RFC v1:
      
      With switchdev support for offloading L2/L3 forwarding data path to a
      switch device, we have a general problem where both the device and the
      kernel may forward the packet, resulting in duplicate packets on the wire.
      Anytime a packet is forwarded by the device and a copy is sent to the CPU,
      there is potential for duplicate forwarding, as the kernel may also do a
      forwarding lookup and send the packet on the wire.
      
      The specific problem this patch series is interested in solving is avoiding
      duplicate packets on bridged ports.  There was a previous RFC from Roopa
      (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
      problem, but didn't solve the problem of mixed ports in the bridge from
      different devices; there was no way to exclude some ports from forwarding
      and include others.  This RFC solves that problem by tagging the ingressing
      packet with a unique mark, and then comparing the packet mark with the
      egress port mark, and skip forwarding when there is a match.  For the mixed
      ports bridge case, only those ports with matching marks are skipped.
      
      The switchdev port driver must do two things:
      
      1) Generate a fwd_mark for each switch port, using some unique key of the
         switch device (and optionally port).  This is done when the port netdev
         is registered or if the port's group membership changes (joins/leaves
         a bridge, for example).
      
      2) On packet ingress from port, mark the skb with the ingress port's
         fwd_mark.  If the device supports it, it's useful to only mark skbs
         which were already forwarded by the device.  If the device does not
         support such indication, all skbs can be marked, even if they're
         local dst.
      
      Two new 32-bit fields are added to struct sk_buff and struct netdevice to
      hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
      tried using skb->mark for this purpose, but ebtables can overwrite the
      skb->mark before the bridge gets it, so that will not work.
      
      In general, this fwd_mark can be used for any case where a packet is
      forwarded by the device and a copy is sent to the CPU, to avoid the kernel
      re-forwarding the packet.  sFlow is another use-case that comes to mind,
      but I haven't explored the details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd265242
    • Scott Feldman's avatar
    • Scott Feldman's avatar
      rocker: add offload_fwd_mark support · 3f98a8e6
      Scott Feldman authored
      If device flags ingress packet as "fwd offload", mark the
      skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark.  This
      will be the hint to the kernel that this packet has already been forwarded
      by device to egress ports matching skb->offlaod_fwd_mark.
      
      For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and
      port ifindex.  If port is bridged, use the bridge ifindex rather than the
      port ifindex.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f98a8e6
    • Scott Feldman's avatar
      switchdev: add offload_fwd_mark generator helper · 1a3b2ec9
      Scott Feldman authored
      skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
      unique for device and may even be unique for a sub-set of ports within
      device, so add switchdev helper function to generate unique marks based on
      port's switch ID and group_ifindex.  group_ifindex would typically be the
      container dev's ifindex, such as the bridge's ifindex.
      
      The generator uses a global hash table to store offload_fwd_marks hashed by
      {switch ID, group_ifindex} key.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a3b2ec9
    • Scott Feldman's avatar
    • Scott Feldman's avatar
      net: don't reforward packets already forwarded by offload device · 0c4f691f
      Scott Feldman authored
      Just before queuing skb for xmit on port, check if skb has been marked by
      switchdev port driver as already fordwarded by device.  If so, drop skb.  A
      non-zero skb->offload_fwd_mark field is set by the switchdev port
      driver/device on ingress to indicate the skb has already been forwarded by
      the device to egress ports with matching dev->skb_mark.  The switchdev port
      driver would assign a non-zero dev->offload_skb_mark for each device port
      netdev during registration, for example.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c4f691f
    • Simon Horman's avatar
      rocker: forward packets to CPU when port is joined to openvswitch · 8254973f
      Simon Horman authored
      Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
      There is scope to later refine what is passed up as per Open vSwitch flows
      on a port.
      
      This does not change the behaviour of rocker ports that are
      not joined to Open vSwitch.
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8254973f
  2. 20 Jul, 2015 11 commits
  3. 16 Jul, 2015 2 commits
    • YOSHIFUJI Hideaki's avatar
    • David S. Miller's avatar
      Merge branch 'protodown' · 0d057881
      David S. Miller authored
      Anuradha Karuppiah says:
      
      ====================
      net: Introduce protodown flag.
      
      User space daemons can detect errors in the network that need to be
      notified to the switch device drivers.
      
      Drivers can react to this error state by doing a phy-down on the
      switch-port which would result in a carrier-off locally and on the directly
      connected switch. Doing that would prevent loops and black-holes in the
      network.
      
      One such use case is the multi-chassis LAG application -
      
      1. The MLAG application runs on peer switches (say Switch0 and Switch1)
         synchronizing states, forwarding entries etc. between the two
         switches over the peer-link (this is a link directly connecting the
         two switches).
      2. An MLAG election process designates one of the switches as a primary
         (for e.g. Switch0 is primary and Switch1 is secondary).
      3. The peer link plays a critical role in allowing Switch0-Switch1 to
         function as a single LAG partner to the downstream dual-connected
         servers. When the peer-link between the switches goes down we have a
         split-brain situation. Switch0 and Switch1 are no longer in sync and
         are acting independently. This can result in traffic loops and
         traffic black-holing in the network.
      4. To prevent these problems the MLAG application on the secondary
         switch phy-downs the MLAG ports on detecting the peer-link down.
         This will be seen as a carrier down on servers that are
         dual-connected to Switch0 and Switch1.
      5. Specifically a dual-connected server will see a carrier-down on the
         port connected to the MLAG secondary, Switch1, and will stop using
         that port for traffic TX. So traffic black holing is prevented.
      
      v6 to v7:
         Removed some unnecessary code in response to review comments.
      
      v5 to v6:
         Replaced proto_flags with a simple proto_down boolean attribute in
         response to Dave's comments.
      
      v4 to v5:
         Changed the ip link display format for protodown to match the set as
         recommended by Stephen.
      
      v3 to v4:
         I have moved protodown out of IFF_XXX and introduced a separate
         proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify
         switch port errors. This is in response to Stephen's comments that
         adding a new IFF_XXX may break user space.
      
         I have used rocker as the sample switch driver. And to test this
         functionality I used the qemu-rocker patch that Scott sent out in
         response to the v3 posting (needed to set link up/down when phy is
         enabled/disabled).
      
      v1 to v2:
         Based on Dave's suggestion I have moved out aggregating of error bits
         across applications to a user space framework. This patch now simply
         notifies an aggregated error bit to drivers enabling them to handle
         the error gracefully.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d057881