• Jack Morgenstein's avatar
    RDMA/core: Clean up cq pool mechanism · 286e1d3f
    Jack Morgenstein authored
    The CQ pool mechanism had two problems:
    
    1. The CQ pool lists were uninitialized in the device registration error
       flow.  As a result, all the list pointers remained NULL.  This caused
       the kernel to crash (in procedure ib_cq_pool_destroy) when that error
       flow was taken (and unregister called).  The stack trace snippet:
    
         BUG: kernel NULL pointer dereference, address: 0000000000000000
         #PF: supervisor read access in kernel mode
         #PF: error_code(0×0000) ? not-present page
         PGD 0 P4D 0
         Oops: 0000 [#1] SMP PTI
         . . .
         RIP: 0010:ib_cq_pool_destroy+0x1b/0×70 [ib_core]
         . . .
         Call Trace:
          disable_device+0x9f/0×130 [ib_core]
          __ib_unregister_device+0x35/0×90 [ib_core]
          ib_register_device+0x529/0×610 [ib_core]
          __mlx5_ib_add+0x3a/0×70 [mlx5_ib]
          mlx5_add_device+0x87/0×1c0 [mlx5_core]
          mlx5_register_interface+0x74/0xc0 [mlx5_core]
          do_one_initcall+0x4b/0×1f4
          do_init_module+0x5a/0×223
          load_module+0x1938/0×1d40
    
    2. At device unregister, when cleaning up the cq pool, the cq's in the
       pool lists were freed, but the cq entries were left in the list.
    
    The fix for the first issue is to initialize the cq pool lists when the
    ib_device structure is allocated for a new device (in procedure
    _ib_alloc_device).
    
    The fix for the second problem is to delete cq entries from the pool lists
    when cleaning up the cq pool.
    
    In addition, procedure ib_cq_pool_destroy() is renamed to the more
    appropriate name ib_cq_pool_cleanup().
    
    Fixes: 4aa16152 ("RDMA/core: Fix ordering of CQ pool destruction")
    Link: https://lore.kernel.org/r/20201208073545.9723-2-leon@kernel.orgSuggested-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    286e1d3f
device.c 75.7 KB