1. 12 Jun, 2023 13 commits
  2. 11 Jun, 2023 8 commits
    • Cheng Xu's avatar
      RDMA/erdma: Refactor the original doorbell allocation mechanism · 3b3dfd58
      Cheng Xu authored
      The original doorbell allocation mechanism is complex and does not meet
      the isolation requirement. So we introduce a new doorbell mechanism and the
      original mechanism (only be used with CAP_SYS_RAWIO if hardware does not
      support the new mechanism) needs to be kept as simple as possible for
      compatibility.
      Signed-off-by: default avatarCheng Xu <chengyou@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230606055005.80729-5-chengyou@linux.alibaba.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      3b3dfd58
    • Cheng Xu's avatar
      RDMA/erdma: Associate QPs/CQs with doorbells for authorization · 6534de1f
      Cheng Xu authored
      For the isolation requirement, each QP/CQ can only issue doorbells from the
      allocated mmio space. Configure the relationship between QPs/CQs and
      mmio doorbell spaces to hardware in create_qp/create_cq interfaces.
      Signed-off-by: default avatarCheng Xu <chengyou@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230606055005.80729-4-chengyou@linux.alibaba.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      6534de1f
    • Cheng Xu's avatar
      RDMA/erdma: Allocate doorbell resources from hardware · 7e9a1dad
      Cheng Xu authored
      Each ucontext will try to allocate doorbell resources in the extended bar
      space from hardware. For compatibility, we change nothing for the original
      bar space, and it will be used only for applications with CAP_SYS_RAWIO
      authority in the older HW/FW environments.
      Signed-off-by: default avatarCheng Xu <chengyou@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230606055005.80729-3-chengyou@linux.alibaba.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      7e9a1dad
    • Cheng Xu's avatar
      RDMA/erdma: Configure PAGE_SIZE to hardware · 128f8404
      Cheng Xu authored
      Add a new CMDQ message to configure hardware. Initially the page size (in
      the format of shift) will be passed to hardware, so that hardware can
      organize the mmio space properly. It's called only if hardware supports it.
      Signed-off-by: default avatarCheng Xu <chengyou@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230606055005.80729-2-chengyou@linux.alibaba.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      128f8404
    • Patrisious Haddad's avatar
      RDMA/mlx5: Return the firmware result upon destroying QP/RQ · 22664c06
      Patrisious Haddad authored
      Previously when destroying a QP/RQ, the result of the firmware
      destruction function was ignored and upper layers weren't informed
      about the failure.
      Which in turn could lead to various problems since when upper layer
      isn't aware of the failure it continues its operation thinking that the
      related QP/RQ was successfully destroyed while it actually wasn't,
      which could lead to the below kernel WARN.
      
      Currently, we return the correct firmware destruction status to upper
      layers which in case of the RQ would be mlx5_ib_destroy_wq() which
      was already capable of handling RQ destruction failure or in case of
      a QP to destroy_qp_common(), which now would actually warn upon qp
      destruction failure.
      
      WARNING: CPU: 3 PID: 995 at drivers/infiniband/core/rdma_core.c:940 uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs]
      Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
      CPU: 3 PID: 995 Comm: python3 Not tainted 5.16.0-rc5+ #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      RIP: 0010:uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs]
      Code: 41 5c 41 5d 41 5e e9 44 34 f0 e0 48 89 df e8 4c 77 ff ff 49 8b 86 10 01 00 00 48 85 c0 74 a1 4c 89 e7 ff d0 eb 9a 0f 0b eb c1 <0f> 0b be 04 00 00 00 48 89 df e8 b6 f6 ff ff e9 75 ff ff ff 90 0f
      RSP: 0018:ffff8881533e3e78 EFLAGS: 00010287
      RAX: ffff88811b2cf3e0 RBX: ffff888106209700 RCX: 0000000000000000
      RDX: ffff888106209780 RSI: ffff8881533e3d30 RDI: ffff888109b101a0
      RBP: 0000000000000001 R08: ffff888127cb381c R09: 0de9890000000009
      R10: ffff888127cb3800 R11: 0000000000000000 R12: ffff888106209780
      R13: ffff888106209750 R14: ffff888100f20660 R15: 0000000000000000
      FS:  00007f8be353b740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8bd5b117c0 CR3: 000000012cd8a004 CR4: 0000000000370ea0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ib_uverbs_close+0x1a/0x90 [ib_uverbs]
       __fput+0x82/0x230
       task_work_run+0x59/0x90
       exit_to_user_mode_prepare+0x138/0x140
       syscall_exit_to_user_mode+0x1d/0x50
       ? __x64_sys_close+0xe/0x40
       do_syscall_64+0x4a/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f8be3ae0abb
      Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 83 43 f9 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 c1 43 f9 ff 8b 44
      RSP: 002b:00007ffdb51909c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000557bb7f7c020 RCX: 00007f8be3ae0abb
      RDX: 0000557bb7c74010 RSI: 0000557bb7f14ca0 RDI: 0000000000000005
      RBP: 0000557bb7fbd598 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000293 R12: 0000557bb7fbd5b8
      R13: 0000557bb7fbd5a8 R14: 0000000000001000 R15: 0000557bb7f7c020
       </TASK>
      Signed-off-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Link: https://lore.kernel.org/r/c6df677f931d18090bafbe7f7dbb9524047b7d9b.1685953497.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      22664c06
    • Patrisious Haddad's avatar
      RDMA/mlx5: Handle DCT QP logic separately from low level QP interface · afff2489
      Patrisious Haddad authored
      Previously when destroying a DCT, if the firmware function for the
      destruction failed, the common resource would have been destroyed
      either way, since it was destroyed before the firmware object.
      Which leads to kernel warning "refcount_t: underflow" which indicates
      possible use-after-free.
      Which is triggered when we try to destroy the common resource for the
      second time and execute refcount_dec_and_test(&common->refcount).
      
      So, let's fix the destruction order by factoring out the DCT QP logic
      to be in separate XArray database.
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 8 PID: 1002 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
      Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
      CPU: 8 PID: 1002 Comm: python3 Not tainted 5.16.0-rc5+ #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      RIP: 0010:refcount_warn_saturate+0xd8/0xe0
      Code: ff 48 c7 c7 18 f5 23 82 c6 05 60 70 ff 00 01 e8 d0 0a 45 00 0f 0b c3 48 c7 c7 c0 f4 23 82 c6 05 4c 70 ff 00 01 e8 ba 0a 45 00 <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
      RSP: 0018:ffff8881221d3aa8 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: ffff8881313e8d40 RCX: ffff88852cc1b5c8
      RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88852cc1b5c0
      RBP: ffff888100f70000 R08: ffff88853ffd1ba8 R09: 0000000000000003
      R10: 00000000fffff000 R11: 3fffffffffffffff R12: 0000000000000246
      R13: ffff888100f71fa0 R14: ffff8881221d3c68 R15: 0000000000000020
      FS:  00007efebbb13740(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005611aac29f80 CR3: 00000001313de004 CR4: 0000000000370ea0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       destroy_resource_common+0x6e/0x95 [mlx5_ib]
       mlx5_core_destroy_rq_tracked+0x38/0xbe [mlx5_ib]
       mlx5_ib_destroy_wq+0x22/0x80 [mlx5_ib]
       ib_destroy_wq_user+0x1f/0x40 [ib_core]
       uverbs_free_wq+0x19/0x40 [ib_uverbs]
       destroy_hw_idr_uobject+0x18/0x50 [ib_uverbs]
       uverbs_destroy_uobject+0x2f/0x190 [ib_uverbs]
       uobj_destroy+0x3c/0x80 [ib_uverbs]
       ib_uverbs_cmd_verbs+0x3e4/0xb80 [ib_uverbs]
       ? uverbs_free_wq+0x40/0x40 [ib_uverbs]
       ? ip_list_rcv+0xf7/0x120
       ? netif_receive_skb_list_internal+0x1b6/0x2d0
       ? task_tick_fair+0xbf/0x450
       ? __handle_mm_fault+0x11fc/0x1450
       ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs]
       __x64_sys_ioctl+0x3e4/0x8e0
       ? handle_mm_fault+0xb9/0x210
       do_syscall_64+0x3d/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7efebc0be17b
      Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffe71813e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007ffe71813fb8 RCX: 00007efebc0be17b
      RDX: 00007ffe71813fa0 RSI: 00000000c0181b01 RDI: 0000000000000005
      RBP: 00007ffe71813f80 R08: 00005611aae96020 R09: 000000000000004f
      R10: 00007efebbf9ffa0 R11: 0000000000000246 R12: 00007ffe71813f80
      R13: 00007ffe71813f4c R14: 00005611aae2eca0 R15: 00007efeae6c89d0
       </TASK>
      Signed-off-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Link: https://lore.kernel.org/r/4470888466c8a898edc9833286967529cc5f3c0d.1685953497.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      afff2489
    • Leon Romanovsky's avatar
      RDMA/mlx5: Reduce QP table exposure · 2ecfd946
      Leon Romanovsky authored
      driver.h is common header to whole mlx5 code base, but struct
      mlx5_qp_table is used in mlx5_ib driver only. So move that struct
      to be under sole responsibility of mlx5_ib.
      
      Link: https://lore.kernel.org/r/bec0dc1158e795813b135d1143147977f26bf668.1685953497.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      2ecfd946
    • Patrisious Haddad's avatar
      net/mlx5: Nullify qp->dbg pointer post destruction · c023b61a
      Patrisious Haddad authored
      Nullifying qp->dbg is a preparation for the next patches
      from the series in which mlx5_core_destroy_qp() could actually fail,
      and then it can be called again which causes a kernel crash, since
      qp->dbg was not nullified in previous call.
      Signed-off-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Link: https://lore.kernel.org/r/1677e52bb642fd8d6062d73a5aa69083c0283dc9.1685953497.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      c023b61a
  3. 09 Jun, 2023 8 commits
  4. 01 Jun, 2023 10 commits
  5. 19 May, 2023 1 commit