• Vladimir Oltean's avatar
    net: dsa: sja1105: fix reception from VLAN-unaware bridges · 1f9fc48f
    Vladimir Oltean authored
    The blamed commit introduced an unexpected regression in the sja1105
    driver. Packets from VLAN-unaware bridge ports get received correctly,
    but the protocol stack can't seem to decode them properly.
    
    For ds->untag_bridge_pvid users (thus also sja1105), the blamed commit
    did introduce a functional change: dsa_switch_rcv() used to call
    dsa_untag_bridge_pvid(), which looked like this:
    
    	err = br_vlan_get_proto(br, &proto);
    	if (err)
    		return skb;
    
    	/* Move VLAN tag from data to hwaccel */
    	if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) {
    		skb = skb_vlan_untag(skb);
    		if (!skb)
    			return NULL;
    	}
    
    and now it calls dsa_software_vlan_untag() which has just this:
    
    	/* Move VLAN tag from data to hwaccel */
    	if (!skb_vlan_tag_present(skb)) {
    		skb = skb_vlan_untag(skb);
    		if (!skb)
    			return NULL;
    	}
    
    thus lacks any skb->protocol == bridge VLAN protocol check. That check
    is deferred until a later check for skb->vlan_proto (in the hwaccel area).
    
    The new code is problematic because, for VLAN-untagged packets,
    skb_vlan_untag() blindly takes the 4 bytes starting with the EtherType
    and turns them into a hwaccel VLAN tag. This is what breaks the protocol
    stack.
    
    It would be tempting to "make it work as before" and only call
    skb_vlan_untag() for those packets with the skb->protocol actually
    representing a VLAN.
    
    But the premise of the newly introduced dsa_software_vlan_untag() core
    function is not wrong. Drivers set ds->untag_bridge_pvid or
    ds->untag_vlan_aware_bridge_pvid presumably because they send all
    traffic to the CPU reception path as VLAN-tagged. So why should we spend
    any additional CPU cycles assuming that the packet may be VLAN-untagged?
    And why does the sja1105 driver opt into ds->untag_bridge_pvid if it
    doesn't always deliver packets to the CPU as VLAN-tagged?
    
    The answer to the latter question is indeed more interesting: it doesn't
    need to. This got done in commit 884be12f ("net: dsa: sja1105: add
    support for imprecise RX"), because I thought it would be needed, but I
    didn't realize that it doesn't actually make a difference.
    
    As explained in the commit message of the blamed patch, ds->untag_bridge_pvid
    only makes a difference in the VLAN-untagged receive path of a bridge port.
    However, in that operating mode, tag_sja1105.c makes use of VLAN tags
    with the ETH_P_SJA1105 TPID, and it decodes and consumes these VLAN tags
    as if they were DSA tags (aka tag_8021q operation). Even if commit
    884be12f ("net: dsa: sja1105: add support for imprecise RX") added
    this logic in sja1105_bridge_vlan_add():
    
    	/* Always install bridge VLANs as egress-tagged on the CPU port. */
    	if (dsa_is_cpu_port(ds, port))
    		flags = 0;
    
    that was for _bridge_ VLANs, which are _not_ committed to hardware
    in VLAN-unaware mode (aka the mode where ds->untag_bridge_pvid does
    anything at all). Even prior to that change, the tag_8021q VLANs
    were always installed as egress-tagged on the CPU port, see
    dsa_switch_tag_8021q_vlan_add():
    
    	u16 flags = 0; // egress-tagged, non-PVID
    
    	if (dsa_port_is_user(dp))
    		flags |= BRIDGE_VLAN_INFO_UNTAGGED |
    			 BRIDGE_VLAN_INFO_PVID;
    
    	err = dsa_port_do_tag_8021q_vlan_add(dp, info->vid,
    					     flags);
    	if (err)
    		return err;
    
    Whether the sja1105 driver needs the new flag, ds->untag_vlan_aware_bridge_pvid,
    rather than ds->untag_bridge_pvid, is a separate discussion. To fix the
    current bug in VLAN-unaware bridge mode, I would argue that the sja1105
    driver should not request something it doesn't need, rather than
    complicating the core DSA helper. Whereas before the blamed commit, this
    setting was harmless, now it has caused breakage.
    
    Fixes: 93e4649e ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges")
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20241001140206.50933-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    1f9fc48f
sja1105_main.c 97.3 KB