• Vladimir Oltean's avatar
    net: dsa: flush switchdev workqueue when leaving the bridge · d7d0d423
    Vladimir Oltean authored
    DSA is preparing to offer switch drivers an API through which they can
    associate each FDB entry with a struct net_device *bridge_dev. This can
    be used to perform FDB isolation (the FDB lookup performed on the
    ingress of a standalone, or bridged port, should not find an FDB entry
    that is present in the FDB of another bridge).
    
    In preparation of that work, DSA needs to ensure that by the time we
    call the switch .port_fdb_add and .port_fdb_del methods, the
    dp->bridge_dev pointer is still valid, i.e. the port is still a bridge
    port.
    
    This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API
    requires drivers that must have sleepable context to handle those events
    to schedule the deferred work themselves. DSA does this through the
    dsa_owq.
    
    It can happen that a port leaves a bridge, del_nbp() flushes the FDB on
    that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context,
    DSA schedules its deferred work, but del_nbp() finishes unlinking the
    bridge as a master from the port before DSA's deferred work is run.
    
    Fundamentally, the port must not be unlinked from the bridge until all
    FDB deletion deferred work items have been flushed. The bridge must wait
    for the completion of these hardware accesses.
    
    An attempt has been made to address this issue centrally in switchdev by
    making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev
    level, which would offer implicit synchronization with del_nbp:
    
    https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/
    
    but it seems that any attempt to modify switchdev's behavior and make
    the events blocking there would introduce undesirable side effects in
    other switchdev consumers.
    
    The most undesirable behavior seems to be that
    switchdev_deferred_process_work() takes the rtnl_mutex itself, which
    would be worse off than having the rtnl_mutex taken individually from
    drivers which is what we have now (except DSA which has removed that
    lock since commit 0faf890f ("net: dsa: drop rtnl_lock from
    dsa_slave_switchdev_event_work")).
    
    So to offer the needed guarantee to DSA switch drivers, I have come up
    with a compromise solution that does not require switchdev rework:
    we already have a hook at the last moment in time when the bridge is
    still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush
    the dsa_owq manually from there, which makes all FDB deletions
    synchronous.
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    d7d0d423
port.c 32.1 KB