1. 04 Nov, 2011 1 commit
    • Mike Marciniszyn's avatar
      IB/qib: Fix panic in RC error flushing logic · 30ab7e23
      Mike Marciniszyn authored
      The following panic can occur when flushing a QP:
      
          RIP: 0010:[<ffffffffa0168e8b>]  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
          RSP: 0018:ffff8803cdc6fc90  EFLAGS: 00010046
          RAX: 0000000000000000 RBX: ffff8803d84ba000 RCX: 0000000000000000
          RDX: 0000000000000005 RSI: ffffc90015a53430 RDI: ffff8803d84ba000
          RBP: ffff8803cdc6fce0 R08: ffff8803cdc6fc90 R09: 0000000000000001
          R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8803d84ba0c0
          R13: ffff8803d84ba5cc R14: 0000000000000800 R15: 0000000000000246
          FS:  0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
          CR2: 0000000000000034 CR3: 00000003e44f9000 CR4: 00000000000406f0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process qib/0 (pid: 1350, threadinfo ffff8803cdc6e000, task ffff88042728a100)
          Stack:
           53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000
           0000000000000000 0000000000000000 0000000000000000 0000000000000000
           0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a
          Call Trace:
           [<ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib]
           [<ffffffffa0165a10>] ?  qib_make_rc_req+0x0/0xe80 [ib_qib]
           [<ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib]
           [<ffffffff814db757>] ? thread_return+0x4e/0x777
           [<ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib]
           [<ffffffff81088bf0>] worker_thread+0x170/0x2a0
           [<ffffffff8108e530>] ?  autoremove_wake_function+0x0/0x40
           [<ffffffff81088a80>] ? worker_thread+0x0/0x2a0
           [<ffffffff8108e1c6>] kthread+0x96/0xa0
           [<ffffffff8100c1ca>] child_rip+0xa/0x20
           [<ffffffff8108e130>] ? kthread+0x0/0xa0
           [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
          RIP  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
      
      The RC error state flush logic in qib_make_rc_req() could return all
      of the acked wqes and potentially have emptied the queue.  It would
      then unconditionally try return a flush completion via
      qib_send_complete() for an invalid wqe, or worse a valid one that is
      not queued. The panic results when the completion code tries to
      maintain an MR reference count for a NULL MR.
      
      This fix modifies logic to only send one completion per
      qib_make_rc_req() call and changing the completion status from
      IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress.
      
      The outer loop will call as many times as necessary to flush the queue.
      Reviewed-by: default avatarRam Vepa <ram.vepa@qlogic.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      30ab7e23
  2. 01 Nov, 2011 39 commits