• Chuck Lever's avatar
    xprtrdma: Fix BUG after a device removal · e89e8d8f
    Chuck Lever authored
    Michal Kalderon reports a BUG that occurs just after device removal:
    
    [  169.112490] rpcrdma: removing device qedr0 for 192.168.110.146:20049
    [  169.143909] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    [  169.181837] IP: rpcrdma_dma_unmap_regbuf+0xa/0x60 [rpcrdma]
    
    The RPC/RDMA client transport attempts to allocate some resources
    on demand. Registered buffers are one such resource. These are
    allocated (or re-allocated) by xprt_rdma_allocate to hold RPC Call
    and Reply messages. A hardware resource is associated with each of
    these buffers, as they can be used for a Send or Receive Work
    Request.
    
    If a device is removed from under an NFS/RDMA mount, the transport
    layer is responsible for releasing all hardware resources before
    the device can be finally unplugged. A BUG results when the NFS
    mount hasn't yet seen much activity: the transport tries to release
    resources that haven't yet been allocated.
    
    rpcrdma_free_regbuf() already checks for this case, so just move
    that check to cover the DEVICE_REMOVAL case as well.
    Reported-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
    Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA ...")
    Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
    Tested-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
    Cc: stable@vger.kernel.org # v4.12+
    Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
    e89e8d8f
verbs.c 40.2 KB