1. 10 Mar, 2024 1 commit
    • Manjunath Patil's avatar
      RDMA/cm: add timeout to cm_destroy_id wait · 96d9cbe2
      Manjunath Patil authored
      Add timeout to cm_destroy_id, so that userspace can trigger any data
      collection that would help in analyzing the cause of delay in destroying
      the cm_id.
      
      New noinline function helps dtrace/ebpf programs to hook on to it.
      Existing functionality isn't changed except triggering a probe-able new
      function at every timeout interval.
      
      We have seen cases where CM messages stuck with MAD layer (either due to
      software bug or faulty HCA), leading to cm_id getting stuck in the
      following call stack. This patch helps in resolving such issues faster.
      
      kernel: ... INFO: task XXXX:56778 blocked for more than 120 seconds.
      ...
      	Call Trace:
      	__schedule+0x2bc/0x895
      	schedule+0x36/0x7c
      	schedule_timeout+0x1f6/0x31f
       	? __slab_free+0x19c/0x2ba
      	wait_for_completion+0x12b/0x18a
      	? wake_up_q+0x80/0x73
      	cm_destroy_id+0x345/0x610 [ib_cm]
      	ib_destroy_cm_id+0x10/0x20 [ib_cm]
      	rdma_destroy_id+0xa8/0x300 [rdma_cm]
      	ucma_destroy_id+0x13e/0x190 [rdma_ucm]
      	ucma_write+0xe0/0x160 [rdma_ucm]
      	__vfs_write+0x3a/0x16d
      	vfs_write+0xb2/0x1a1
      	? syscall_trace_enter+0x1ce/0x2b8
      	SyS_write+0x5c/0xd3
      	do_syscall_64+0x79/0x1b9
      	entry_SYSCALL_64_after_hwframe+0x16d/0x0
      Signed-off-by: default avatarManjunath Patil <manjunath.b.patil@oracle.com>
      Link: https://lore.kernel.org/r/20240309063323.458102-1-manjunath.b.patil@oracle.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      96d9cbe2
  2. 07 Mar, 2024 3 commits
  3. 03 Mar, 2024 2 commits
    • Gustavo A. R. Silva's avatar
      RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings · 155f0436
      Gustavo A. R. Silva authored
      -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
      ready to enable it globally.
      
      There are currently a couple of objects (`alloc_head` and `bundle`) in
      `struct bundle_priv` that contain a couple of flexible structures:
      
      struct bundle_priv {
              /* Must be first */
              struct bundle_alloc_head alloc_head;
      
      	...
      
              /*
               * Must be last. bundle ends in a flex array which overlaps
               * internal_buffer.
               */
              struct uverbs_attr_bundle bundle;
              u64 internal_buffer[32];
      };
      
      So, in order to avoid ending up with a couple of flexible-array members
      in the middle of a struct, we use the `struct_group_tagged()` helper to
      separate the flexible array from the rest of the members in the flexible
      structures:
      
      struct uverbs_attr_bundle {
              struct_group_tagged(uverbs_attr_bundle_hdr, hdr,
      		... the rest of the members
              );
              struct uverbs_attr attrs[];
      };
      
      With the change described above, we now declare objects of the type of
      the tagged struct without embedding flexible arrays in the middle of
      another struct:
      
      struct bundle_priv {
              /* Must be first */
              struct bundle_alloc_head_hdr alloc_head;
      
              ...
      
              struct uverbs_attr_bundle_hdr bundle;
              u64 internal_buffer[32];
      };
      
      We also use `container_of()` whenever we need to retrieve a pointer
      to the flexible structures.
      
      Notice that the `bundle_size` computed in `uapi_compute_bundle_size()`
      remains the same.
      
      So, with these changes, fix the following warnings:
      
      drivers/infiniband/core/uverbs_ioctl.c:45:34: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
         45 |         struct bundle_alloc_head alloc_head;
            |                                  ^~~~~~~~~~
      drivers/infiniband/core/uverbs_ioctl.c:67:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
         67 |         struct uverbs_attr_bundle bundle;
            |                                   ^~~~~~
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Link: https://lore.kernel.org/r/ZeIgeZ5Sb0IZTOyt@neatReviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      155f0436
    • Junxian Huang's avatar
      RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity · 6ec429d5
      Junxian Huang authored
      Currently, congestion control algorithm is statically configured in
      FW, and all QPs use the same algorithm(except UD which has a fixed
      configuration of DCQCN). This is not flexible enough.
      
      Support userspace configuring congestion control algorithm with QP
      granularity while creating QPs. If the algorithm is not specified in
      userspace, use the default one.
      Signed-off-by: default avatarJunxian Huang <huangjunxian6@hisilicon.com>
      Link: https://lore.kernel.org/r/20240301104845.1141083-1-huangjunxian6@hisilicon.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      6ec429d5
  4. 25 Feb, 2024 1 commit
  5. 21 Feb, 2024 2 commits
    • Erick Archer's avatar
      RDMA/uverbs: Remove flexible arrays from struct *_filter · 14b526f5
      Erick Archer authored
      When a struct containing a flexible array is included in another struct,
      and there is a member after the struct-with-flex-array, there is a
      possibility of memory overlap. These cases must be audited [1]. See:
      
      struct inner {
      	...
      	int flex[];
      };
      
      struct outer {
      	...
      	struct inner header;
      	int overlap;
      	...
      };
      
      This is the scenario for all the "struct *_filter" structures that are
      included in the following "struct ib_flow_spec_*" structures:
      
      struct ib_flow_spec_eth
      struct ib_flow_spec_ib
      struct ib_flow_spec_ipv4
      struct ib_flow_spec_ipv6
      struct ib_flow_spec_tcp_udp
      struct ib_flow_spec_tunnel
      struct ib_flow_spec_esp
      struct ib_flow_spec_gre
      struct ib_flow_spec_mpls
      
      The pattern is like the one shown below:
      
      struct *_filter {
      	...
      	u8 real_sz[];
      };
      
      struct ib_flow_spec_* {
      	...
      	struct *_filter val;
      	struct *_filter mask;
      };
      
      In this case, the trailing flexible array "real_sz" is never allocated
      and is only used to calculate the size of the structures. Here the use
      of the "offsetof" helper can be changed by the "sizeof" operator because
      the goal is to get the size of these structures. Therefore, the trailing
      flexible arrays can also be removed.
      
      However, due to the trailing padding that can be induced in structs it
      is possible that the:
      
      offsetof(struct *_filter, real_sz) != sizeof(struct *_filter)
      
      This situation happens with the "struct ib_flow_ipv6_filter" and to
      avoid it the "__packed" macro is used in this structure. But now, the
      "sizeof(struct ib_flow_ipv6_filter)" has changed. This is not a problem
      since this size is not used in the code.
      
      The situation now is that "sizeof(struct ib_flow_spec_ipv6)" has also
      changed (this struct contains the struct ib_flow_ipv6_filter). This is
      also not a problem since it is only used to set the size of the "union
      ib_flow_spec", which can store all the "ib_flow_spec_*" structures.
      
      Link: https://lore.kernel.org/r/20240217142913.4285-1-erick.archer@gmx.comSigned-off-by: default avatarErick Archer <erick.archer@gmx.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      14b526f5
    • Shifeng Li's avatar
      RDMA/device: Fix a race between mad_client and cm_client init · 7a8bccd8
      Shifeng Li authored
      The mad_client will be initialized in enable_device_and_get(), while the
      devices_rwsem will be downgraded to a read semaphore. There is a window
      that leads to the failed initialization for cm_client, since it can not
      get matched mad port from ib_mad_port_list, and the matched mad port will
      be added to the list after that.
      
          mad_client    |                       cm_client
      ------------------|--------------------------------------------------------
      ib_register_device|
      enable_device_and_get
      down_write(&devices_rwsem)
      xa_set_mark(&devices, DEVICE_REGISTERED)
      downgrade_write(&devices_rwsem)
                        |
                        |ib_cm_init
                        |ib_register_client(&cm_client)
                        |down_read(&devices_rwsem)
                        |xa_for_each_marked (&devices, DEVICE_REGISTERED)
                        |add_client_context
                        |cm_add_one
                        |ib_register_mad_agent
                        |ib_get_mad_port
                        |__ib_get_mad_port
                        |list_for_each_entry(entry, &ib_mad_port_list, port_list)
                        |return NULL
                        |up_read(&devices_rwsem)
                        |
      add_client_context|
      ib_mad_init_device|
      ib_mad_port_open  |
      list_add_tail(&port_priv->port_list, &ib_mad_port_list)
      up_read(&devices_rwsem)
                        |
      
      Fix it by using down_write(&devices_rwsem) in ib_register_client().
      
      Fixes: d0899892 ("RDMA/device: Provide APIs from the core code to help unregistration")
      Link: https://lore.kernel.org/r/20240203035313.98991-1-lishifeng@sangfor.com.cnSuggested-by: default avatarJason Gunthorpe <jgg@ziepe.ca>
      Signed-off-by: default avatarShifeng Li <lishifeng@sangfor.com.cn>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      7a8bccd8
  6. 19 Feb, 2024 1 commit
  7. 04 Feb, 2024 3 commits
  8. 31 Jan, 2024 1 commit
  9. 25 Jan, 2024 15 commits
  10. 21 Jan, 2024 11 commits