• Nikolay Aleksandrov's avatar
    bonding: fix curr_active_slave/carrier with loadbalance arp monitoring · b8e4500f
    Nikolay Aleksandrov authored
    Since commit 6fde8f03 ("bonding: fix locking in
    bond_loadbalance_arp_mon()") we can have a stale bond carrier state and
    stale curr_active_slave when using arp monitoring in loadbalance modes. The
    reason is that in bond_loadbalance_arp_mon() we can't have
    do_failover == true but slave_state_changed == false, whenever do_failover
    is true then slave_state_changed is also true. Then the following piece
    from bond_loadbalance_arp_mon():
                    if (slave_state_changed) {
                            bond_slave_state_change(bond);
                            if (BOND_MODE(bond) == BOND_MODE_XOR)
                                    bond_update_slave_arr(bond, NULL);
                    } else if (do_failover) {
                            block_netpoll_tx();
                            bond_select_active_slave(bond);
                            unblock_netpoll_tx();
                    }
    
    will execute only the first branch, always and regardless of do_failover.
    Since these two events aren't related in such way, we need to decouple and
    consider them separately.
    
    For example this issue could lead to the following result:
    Bonding Mode: load balancing (round-robin)
    *MII Status: down*
    MII Polling Interval (ms): 0
    Up Delay (ms): 0
    Down Delay (ms): 0
    ARP Polling Interval (ms): 100
    ARP IP target/s (n.n.n.n form): 192.168.9.2
    
    Slave Interface: ens12
    *MII Status: up*
    Speed: 10000 Mbps
    Duplex: full
    Link Failure Count: 2
    Permanent HW addr: 00:0f:53:01:42:2c
    Slave queue ID: 0
    
    Slave Interface: eth1
    *MII Status: up*
    Speed: Unknown
    Duplex: Unknown
    Link Failure Count: 70
    Permanent HW addr: 52:54:00:2f:0f:8e
    Slave queue ID: 0
    
    Since some interfaces are up, then the status of the bond should also be
    up, but it will never change unless something invokes bond_set_carrier()
    (i.e. enslave, bond_select_active_slave etc). Now, if I force the
    calling of bond_select_active_slave via for example changing
    primary_reselect (it can change in any mode), then the MII status goes to
    "up" because it calls bond_select_active_slave() which should've been done
    from bond_loadbalance_arp_mon() itself.
    
    CC: Veaceslav Falico <vfalico@gmail.com>
    CC: Jay Vosburgh <j.vosburgh@gmail.com>
    CC: Andy Gospodarek <andy@greyhouse.net>
    CC: Ding Tianhong <dingtianhong@huawei.com>
    
    Fixes: 6fde8f03 ("bonding: fix locking in bond_loadbalance_arp_mon()")
    Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
    Acked-by: default avatarVeaceslav Falico <vfalico@gmail.com>
    Acked-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
    Acked-by: default avatarDing Tianhong <dingtianhong@huawei.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    b8e4500f
bond_main.c 127 KB