• Vladimir Oltean's avatar
    net: switchdev: remove vid_begin -> vid_end range from VLAN objects · b7a9e0da
    Vladimir Oltean authored
    The call path of a switchdev VLAN addition to the bridge looks something
    like this today:
    
            nbp_vlan_init
            |  __br_vlan_set_default_pvid
            |  |                       |
            |  |    br_afspec          |
            |  |        |              |
            |  |        v              |
            |  | br_process_vlan_info  |
            |  |        |              |
            |  |        v              |
            |  |   br_vlan_info        |
            |  |       / \            /
            |  |      /   \          /
            |  |     /     \        /
            |  |    /       \      /
            v  v   v         v    v
          nbp_vlan_add   br_vlan_add ------+
           |              ^      ^ |       |
           |             /       | |       |
           |            /       /  /       |
           \ br_vlan_get_master/  /        v
            \        ^        /  /  br_vlan_add_existing
             \       |       /  /          |
              \      |      /  /          /
               \     |     /  /          /
                \    |    /  /          /
                 \   |   /  /          /
                  v  |   | v          /
                  __vlan_add         /
                     / |            /
                    /  |           /
                   v   |          /
       __vlan_vid_add  |         /
                   \   |        /
                    v  v        v
          br_switchdev_port_vlan_add
    
    The ranges UAPI was introduced to the bridge in commit bdced7ef
    ("bridge: support for multiple vlans and vlan ranges in setlink and
    dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
    have always been passed one by one, through struct bridge_vlan_info
    tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
    
    Then Scott Feldman introduced the switchdev_port_bridge_setlink function
    in commit 47f8328b ("switchdev: add new switchdev bridge setlink").
    That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
    full use of the range. But switchdev_port_bridge_setlink was called like
    this:
    
    br_setlink
    -> br_afspec
    -> switchdev_port_bridge_setlink
    
    Basically, the switchdev and the bridge code were not tightly integrated.
    Then commit 41c498b9 ("bridge: restore br_setlink back to original")
    came, and switchdev drivers were required to implement
    .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
    
    In the meantime, commits such as 0944d6b5 ("bridge: try switchdev op
    first in __vlan_vid_add/del") finally made switchdev penetrate the
    br_vlan_info() barrier and start to develop the call path we have today.
    But remember, br_vlan_info() still receives VLANs one by one.
    
    Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
    29ab586c ("net: switchdev: Remove bridge bypass support from
    switchdev") so that drivers would not implement .ndo_bridge_setlink any
    longer. The switchdev_port_bridge_setlink also got deleted.
    This refactoring removed the parallel bridge_setlink implementation from
    switchdev, and left the only switchdev VLAN objects to be the ones
    offloaded from __vlan_vid_add (basically RX filtering) and  __vlan_add
    (the latter coming from commit 9c86ce2c ("net: bridge: Notify about
    bridge VLANs")).
    
    That is to say, today the switchdev VLAN object ranges are not used in
    the kernel. Refactoring the above call path is a bit complicated, when
    the bridge VLAN call path is already a bit complicated.
    
    Let's go off and finish the job of commit 29ab586c by deleting the
    bogus iteration through the VLAN ranges from the drivers. Some aspects
    of this feature never made too much sense in the first place. For
    example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
    flag supposed to mean, when a port can obviously have a single pvid?
    This particular configuration _is_ denied as of commit 6623c60d
    ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
    perspective, the driver still has to play pretend, and only offload the
    vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
    modify the flags of another, completely unrelated, switchdev VLAN
    object! (a VLAN that is PVID will invalidate the PVID flag from whatever
    other VLAN had previously been offloaded with switchdev and had that
    flag. Yet switchdev never notifies about that change, drivers are
    supposed to guess).
    
    Nonetheless, having a VLAN range in the API makes error handling look
    scarier than it really is - unwinding on errors and all of that.
    When in reality, no one really calls this API with more than one VLAN.
    It is all unnecessary complexity.
    
    And despite appearing pretentious (two-phase transactional model and
    all), the switchdev API is really sloppy because the VLAN addition and
    removal operations are not paired with one another (you can add a VLAN
    100 times and delete it just once). The bridge notifies through
    switchdev of a VLAN addition not only when the flags of an existing VLAN
    change, but also when nothing changes. There are switchdev drivers out
    there who don't like adding a VLAN that has already been added, and
    those checks don't really belong at driver level. But the fact that the
    API contains ranges is yet another factor that prevents this from being
    addressed in the future.
    
    Of the existing switchdev pieces of hardware, it appears that only
    Mellanox Spectrum supports offloading more than one VLAN at a time,
    through mlxsw_sp_port_vlan_set. I have kept that code internal to the
    driver, because there is some more bookkeeping that makes use of it, but
    I deleted it from the switchdev API. But since the switchdev support for
    ranges has already been de facto deleted by a Mellanox employee and
    nobody noticed for 4 years, I'm going to assume it's not a biggie.
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
    Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
    Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    b7a9e0da
hellcreek.c 33.8 KB