• Ido Schimmel's avatar
    mlxsw: spectrum_acl_tcam: Fix memory leak during rehash · 8ca3f7a7
    Ido Schimmel authored
    The rehash delayed work migrates filters from one region to another.
    This is done by iterating over all chunks (all the filters with the same
    priority) in the region and in each chunk iterating over all the
    filters.
    
    If the migration fails, the code tries to migrate the filters back to
    the old region. However, the rollback itself can also fail in which case
    another migration will be erroneously performed. Besides the fact that
    this ping pong is not a very good idea, it also creates a problem.
    
    Each virtual chunk references two chunks: The currently used one
    ('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the
    first holds the chunk we want to migrate filters to and the second holds
    the chunk we are migrating filters from.
    
    The code currently assumes - but does not verify - that the backup chunk
    does not exist (NULL) if the currently used chunk does not reference the
    target region. This assumption breaks when we are trying to rollback a
    rollback, resulting in the backup chunk being overwritten and leaked
    [1].
    
    Fix by not rolling back a failed rollback and add a warning to avoid
    future cases.
    
    [1]
    WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20
    Modules linked in:
    CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G        W          6.9.0-rc2-custom-00784-gc6a05c468a0b #14
    Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
    Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
    RIP: 0010:parman_destroy+0x17/0x20
    [...]
    Call Trace:
     <TASK>
     mlxsw_sp_acl_atcam_region_fini+0x19/0x60
     mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0
     mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470
     process_one_work+0x151/0x370
     worker_thread+0x2cb/0x3e0
     kthread+0xd0/0x100
     ret_from_fork+0x34/0x50
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    
    Fixes: 84350051 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()")
    Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
    Tested-by: default avatarAlexander Zubkov <green@qrator.net>
    Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
    Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
    Reviewed-by: default avatarSimon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    8ca3f7a7
spectrum_acl_tcam.c 54.4 KB