• Ziyang Xuan's avatar
    bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves · 01f4fd27
    Ziyang Xuan authored
    BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
    following testcase:
    
      # ip netns add ns1
      # ip netns exec ns1 ip link add bond0 type bond mode 0
      # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
      # ip netns exec ns1 ip link set bond_slave_1 master bond0
      # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
      # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
      # ip netns exec ns1 ip link set bond_slave_1 nomaster
      # ip netns del ns1
    
    The logical analysis of the problem is as follows:
    
    1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
    register_vlan_dev()
      vlan_vid_add()
        vlan_info_alloc()
        __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
    
    2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
    register_vlan_dev()
      vlan_vid_add()
        __vlan_vid_add()
          vlan_add_rx_filter_info()
              if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
                  return 0;
    
              if (netif_device_present(dev))
                  return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
                  // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
    
    3. detach bond_slave_1 from bond0:
    __bond_release_one()
      vlan_vids_del_by_dev()
        list_for_each_entry(vid_info, &vlan_info->vid_list, list)
            vlan_vid_del(dev, vid_info->proto, vid_info->vid);
            // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
            // bond_slave_1->vlan_info will be assigned NULL.
    
    4. delete vlan10 during delete ns1:
    default_device_exit_batch()
      dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
        vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
    	BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
    
    Add S-VLAN tag related features support to bond driver. So the bond driver
    will always propagate the VLAN info to its slaves.
    
    Fixes: 8ad227ff ("net: vlan: add 802.1ad support")
    Suggested-by: default avatarIdo Schimmel <idosch@idosch.org>
    Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
    Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
    Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    01f4fd27
bond_main.c 176 KB