1. 11 Jun, 2023 3 commits
    • Patrisious Haddad's avatar
      RDMA/mlx5: Handle DCT QP logic separately from low level QP interface · afff2489
      Patrisious Haddad authored
      Previously when destroying a DCT, if the firmware function for the
      destruction failed, the common resource would have been destroyed
      either way, since it was destroyed before the firmware object.
      Which leads to kernel warning "refcount_t: underflow" which indicates
      possible use-after-free.
      Which is triggered when we try to destroy the common resource for the
      second time and execute refcount_dec_and_test(&common->refcount).
      
      So, let's fix the destruction order by factoring out the DCT QP logic
      to be in separate XArray database.
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 8 PID: 1002 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
      Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
      CPU: 8 PID: 1002 Comm: python3 Not tainted 5.16.0-rc5+ #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      RIP: 0010:refcount_warn_saturate+0xd8/0xe0
      Code: ff 48 c7 c7 18 f5 23 82 c6 05 60 70 ff 00 01 e8 d0 0a 45 00 0f 0b c3 48 c7 c7 c0 f4 23 82 c6 05 4c 70 ff 00 01 e8 ba 0a 45 00 <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
      RSP: 0018:ffff8881221d3aa8 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: ffff8881313e8d40 RCX: ffff88852cc1b5c8
      RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88852cc1b5c0
      RBP: ffff888100f70000 R08: ffff88853ffd1ba8 R09: 0000000000000003
      R10: 00000000fffff000 R11: 3fffffffffffffff R12: 0000000000000246
      R13: ffff888100f71fa0 R14: ffff8881221d3c68 R15: 0000000000000020
      FS:  00007efebbb13740(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005611aac29f80 CR3: 00000001313de004 CR4: 0000000000370ea0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       destroy_resource_common+0x6e/0x95 [mlx5_ib]
       mlx5_core_destroy_rq_tracked+0x38/0xbe [mlx5_ib]
       mlx5_ib_destroy_wq+0x22/0x80 [mlx5_ib]
       ib_destroy_wq_user+0x1f/0x40 [ib_core]
       uverbs_free_wq+0x19/0x40 [ib_uverbs]
       destroy_hw_idr_uobject+0x18/0x50 [ib_uverbs]
       uverbs_destroy_uobject+0x2f/0x190 [ib_uverbs]
       uobj_destroy+0x3c/0x80 [ib_uverbs]
       ib_uverbs_cmd_verbs+0x3e4/0xb80 [ib_uverbs]
       ? uverbs_free_wq+0x40/0x40 [ib_uverbs]
       ? ip_list_rcv+0xf7/0x120
       ? netif_receive_skb_list_internal+0x1b6/0x2d0
       ? task_tick_fair+0xbf/0x450
       ? __handle_mm_fault+0x11fc/0x1450
       ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs]
       __x64_sys_ioctl+0x3e4/0x8e0
       ? handle_mm_fault+0xb9/0x210
       do_syscall_64+0x3d/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7efebc0be17b
      Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffe71813e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007ffe71813fb8 RCX: 00007efebc0be17b
      RDX: 00007ffe71813fa0 RSI: 00000000c0181b01 RDI: 0000000000000005
      RBP: 00007ffe71813f80 R08: 00005611aae96020 R09: 000000000000004f
      R10: 00007efebbf9ffa0 R11: 0000000000000246 R12: 00007ffe71813f80
      R13: 00007ffe71813f4c R14: 00005611aae2eca0 R15: 00007efeae6c89d0
       </TASK>
      Signed-off-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Link: https://lore.kernel.org/r/4470888466c8a898edc9833286967529cc5f3c0d.1685953497.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      afff2489
    • Leon Romanovsky's avatar
      RDMA/mlx5: Reduce QP table exposure · 2ecfd946
      Leon Romanovsky authored
      driver.h is common header to whole mlx5 code base, but struct
      mlx5_qp_table is used in mlx5_ib driver only. So move that struct
      to be under sole responsibility of mlx5_ib.
      
      Link: https://lore.kernel.org/r/bec0dc1158e795813b135d1143147977f26bf668.1685953497.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      2ecfd946
    • Patrisious Haddad's avatar
      net/mlx5: Nullify qp->dbg pointer post destruction · c023b61a
      Patrisious Haddad authored
      Nullifying qp->dbg is a preparation for the next patches
      from the series in which mlx5_core_destroy_qp() could actually fail,
      and then it can be called again which causes a kernel crash, since
      qp->dbg was not nullified in previous call.
      Signed-off-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Link: https://lore.kernel.org/r/1677e52bb642fd8d6062d73a5aa69083c0283dc9.1685953497.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      c023b61a
  2. 09 Jun, 2023 8 commits
  3. 01 Jun, 2023 10 commits
  4. 19 May, 2023 8 commits
  5. 17 May, 2023 4 commits
  6. 14 May, 2023 7 commits