• Jiri Wiesner's avatar
    bonding: fix active-backup failover for current ARP slave · 0410d071
    Jiri Wiesner authored
    When the ARP monitor is used for link detection, ARP replies are
    validated for all slaves (arp_validate=3) and fail_over_mac is set to
    active, two slaves of an active-backup bond may get stuck in a state
    where both of them are active and pass packets that they receive to
    the bond. This state makes IPv6 duplicate address detection fail. The
    state is reached thus:
    1. The current active slave goes down because the ARP target
       is not reachable.
    2. The current ARP slave is chosen and made active.
    3. A new slave is enslaved. This new slave becomes the current active
       slave and can reach the ARP target.
    As a result, the current ARP slave stays active after the enslave
    action has finished and the log is littered with "PROBE BAD" messages:
    > bond0: PROBE: c_arp ens10 && cas ens11 BAD
    The workaround is to remove the slave with "going back" status from
    the bond and re-enslave it. This issue was encountered when DPDK PMD
    interfaces were being enslaved to an active-backup bond.
    
    I would be possible to fix the issue in bond_enslave() or
    bond_change_active_slave() but the ARP monitor was fixed instead to
    keep most of the actions changing the current ARP slave in the ARP
    monitor code. The current ARP slave is set as inactive and backup
    during the commit phase. A new state, BOND_LINK_FAIL, has been
    introduced for slaves in the context of the ARP monitor. This allows
    administrators to see how slaves are rotated for sending ARP requests
    and attempts are made to find a new active slave.
    
    Fixes: b2220cad ("bonding: refactor ARP active-backup monitor")
    Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    0410d071
bond_main.c 147 KB