• Michael J. Ruhl's avatar
    IB/hfi1: Resolve kernel panics by reference counting receive contexts · f683c80c
    Michael J. Ruhl authored
    Base receive contexts can be used by sub contexts.  Because of this,
    resources for the context cannot be completely freed until all sub
    contexts are done using the base context.
    
    Introduce a reference count so that the base receive context can be
    freed only when all sub contexts are done with it.
    
    Use the provided function call for setting default send context
    integrity rather than the manual method.
    
    The cleanup path does not set all variables back to NULL after freeing
    resources.  Since the clean up code can get called more than once,
    (e.g. during context close and on the error path), it is necessary to
    make sure that all the variables are NULLed.
    
    Possible crash are:
    
    BUG: unable to handle kernel paging request at 0000000001908900
    IP: read_csr+0x24/0x30 [hfi1]
    RIP: 0010:read_csr+0x24/0x30 [hfi1]
    Call Trace:
     sc_disable+0x40/0x110 [hfi1]
     hfi1_file_close+0x16f/0x360 [hfi1]
     __fput+0xe7/0x210
     ____fput+0xe/0x10
    
    or
    
    kernel BUG at mm/slub.c:3877!
    RIP: 0010:kfree+0x14f/0x170
    Call Trace:
     hfi1_free_ctxtdata+0x19a/0x2b0 [hfi1]
     ? hfi1_user_exp_rcv_grp_free+0x73/0x80 [hfi1]
     hfi1_file_close+0x20f/0x360 [hfi1]
     __fput+0xe7/0x210
     ____fput+0xe/0x10
    
    Fixes: Commit 62239fc6 ("IB/hfi1: Clean up on context initialization failure")
    Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Reviewed-by: default avatarSebastian Sanchez <sebastian.sanchez@intel.com>
    Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
    Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    f683c80c
file_ops.c 40.9 KB