• Vladimir Oltean's avatar
    net: dsa: sja1105: make VID 4095 a bridge VLAN too · e40cba94
    Vladimir Oltean authored
    This simple series of commands:
    
    ip link add br0 type bridge vlan_filtering 1
    ip link set swp0 master br0
    
    fails on sja1105 with the following error:
    [   33.439103] sja1105 spi0.1: vlan-lookup-table needs to have at least the default untagged VLAN
    [   33.447710] sja1105 spi0.1: Invalid config, cannot upload
    Warning: sja1105: Failed to change VLAN Ethertype.
    
    For context, sja1105 has 3 operating modes:
    - SJA1105_VLAN_UNAWARE: the dsa_8021q_vlans are committed to hardware
    - SJA1105_VLAN_FILTERING_FULL: the bridge_vlans are committed to hardware
    - SJA1105_VLAN_FILTERING_BEST_EFFORT: both the dsa_8021q_vlans and the
      bridge_vlans are committed to hardware
    
    Swapping out a VLAN list and another in happens in
    sja1105_build_vlan_table(), which performs a delta update procedure.
    That function is called from a few places, notably from
    sja1105_vlan_filtering() which is called from the
    SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler.
    
    The above set of 2 commands fails when run on a kernel pre-commit
    8841f6e6 ("net: dsa: sja1105: make devlink property
    best_effort_vlan_filtering true by default"). So the priv->vlan_state
    transition that takes place is between VLAN-unaware and full VLAN
    filtering. So the dsa_8021q_vlans are swapped out and the bridge_vlans
    are swapped in.
    
    So why does it fail?
    
    Well, the bridge driver, through nbp_vlan_init(), first sets up the
    SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING attribute, and only then
    proceeds to call nbp_vlan_add for the default_pvid.
    
    So when we swap out the dsa_8021q_vlans and swap in the bridge_vlans in
    the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler, there are no bridge
    VLANs (yet). So we have wiped the VLAN table clean, and the low-level
    static config checker complains of an invalid configuration. We _will_
    add the bridge VLANs using the dynamic config interface, albeit later,
    when nbp_vlan_add() calls us. So it is natural that it fails.
    
    So why did it ever work?
    
    Surprisingly, it looks like I only tested this configuration with 2
    things set up in a particular way:
    - a network manager that brings all ports up
    - a kernel with CONFIG_VLAN_8021Q=y
    
    It is widely known that commit ad1afb00 ("vlan_dev: VLAN 0 should be
    treated as "no vlan tag" (802.1p packet)") installs VID 0 to every net
    device that comes up. DSA treats these VLANs as bridge VLANs, and
    therefore, in my testing, the list of bridge_vlans was never empty.
    
    However, if CONFIG_VLAN_8021Q is not enabled, or the port is not up when
    it joins a VLAN-aware bridge, the bridge_vlans list will be temporarily
    empty, and the sja1105_static_config_reload() call from
    sja1105_vlan_filtering() will fail.
    
    To fix this, the simplest thing is to keep VID 4095, the one used for
    CPU-injected control packets since commit ed040abc ("net: dsa:
    sja1105: use 4095 as the private VLAN for untagged traffic"), in the
    list of bridge VLANs too, not just the list of tag_8021q VLANs. This
    ensures that the list of bridge VLANs will never be empty.
    
    Fixes: ec5ae610 ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
    Reported-by: default avatarRadu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    e40cba94
sja1105_main.c 108 KB