• Mike Marciniszyn's avatar
    IB/hfi1: Fix AIP early init panic · 5f8f55b9
    Mike Marciniszyn authored
    An early failure in hfi1_ipoib_setup_rn() can lead to the following panic:
    
      BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
      PGD 0 P4D 0
      Oops: 0002 [#1] SMP NOPTI
      Workqueue: events work_for_cpu_fn
      RIP: 0010:try_to_grab_pending+0x2b/0x140
      Code: 1f 44 00 00 41 55 41 54 55 48 89 d5 53 48 89 fb 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 48 89 55 00 40 84 f6 75 77 <f0> 48 0f ba 2b 00 72 09 31 c0 5b 5d 41 5c 41 5d c3 48 89 df e8 6c
      RSP: 0018:ffffb6b3cf7cfa48 EFLAGS: 00010046
      RAX: 0000000000000246 RBX: 00000000000001b0 RCX: 0000000000000000
      RDX: 0000000000000246 RSI: 0000000000000000 RDI: 00000000000001b0
      RBP: ffffb6b3cf7cfa70 R08: 0000000000000f09 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      R13: ffffb6b3cf7cfa90 R14: ffffffff9b2fbfc0 R15: ffff8a4fdf244690
      FS:  0000000000000000(0000) GS:ffff8a527f400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000001b0 CR3: 00000017e2410003 CR4: 00000000007706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       __cancel_work_timer+0x42/0x190
       ? dev_printk_emit+0x4e/0x70
       iowait_cancel_work+0x15/0x30 [hfi1]
       hfi1_ipoib_txreq_deinit+0x5a/0x220 [hfi1]
       ? dev_err+0x6c/0x90
       hfi1_ipoib_netdev_dtor+0x15/0x30 [hfi1]
       hfi1_ipoib_setup_rn+0x10e/0x150 [hfi1]
       rdma_init_netdev+0x5a/0x80 [ib_core]
       ? hfi1_ipoib_free_rdma_netdev+0x20/0x20 [hfi1]
       ipoib_intf_init+0x6c/0x350 [ib_ipoib]
       ipoib_intf_alloc+0x5c/0xc0 [ib_ipoib]
       ipoib_add_one+0xbe/0x300 [ib_ipoib]
       add_client_context+0x12c/0x1a0 [ib_core]
       enable_device_and_get+0xdc/0x1d0 [ib_core]
       ib_register_device+0x572/0x6b0 [ib_core]
       rvt_register_device+0x11b/0x220 [rdmavt]
       hfi1_register_ib_device+0x6b4/0x770 [hfi1]
       do_init_one.isra.20+0x3e3/0x680 [hfi1]
       local_pci_probe+0x41/0x90
       work_for_cpu_fn+0x16/0x20
       process_one_work+0x1a7/0x360
       ? create_worker+0x1a0/0x1a0
       worker_thread+0x1cf/0x390
       ? create_worker+0x1a0/0x1a0
       kthread+0x116/0x130
       ? kthread_flush_work_fn+0x10/0x10
       ret_from_fork+0x1f/0x40
    
    The panic happens in hfi1_ipoib_txreq_deinit() because there is a NULL
    deref when hfi1_ipoib_netdev_dtor() is called in this error case.
    
    hfi1_ipoib_txreq_init() and hfi1_ipoib_rxq_init() are self unwinding so
    fix by adjusting the error paths accordingly.
    
    Other changes:
    - hfi1_ipoib_free_rdma_netdev() is deleted including the free_netdev()
      since the netdev core code deletes calls free_netdev()
    - The switch to the accelerated entrances is moved to the success path.
    
    Cc: stable@vger.kernel.org
    Fixes: d99dc602 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
    Link: https://lore.kernel.org/r/1642287756-182313-4-git-send-email-mike.marciniszyn@cornelisnetworks.comReviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
    Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    5f8f55b9
ipoib_main.c 5.74 KB