- 09 Feb, 2019 1 commit
-
-
Doug Ledford authored
Omni-Path TID RDMA Feature Intel Omni-Path (OPA) TID RDMA support is a feature that accelerates data movement between two OPA nodes through the IB Verbs interface. It improves RDMA READ/WRITE performance by delivering the data payload to a user buffer directly without any software copying. Architecture ============= The TID RDMA protocol is implemented on the hfi1 driver level and is therefore transparent to the ULPs. It is designed to facilitate the data transactions for two specific RDMA requests: - RDMA READ; - RDMA WRITE. Previously, when a verbs data packet is received at the destination (requester side for RDMA READ and responder side for RDMA WRITE), the data payload is copied to the user buffer by software, which slows down the performance significantly for large requests. Internally, hfi1 converts qualified RDMA READ/WRITE requests into TID RDMA READ/WRITE requests when the requests are post sent to the hfi1 driver. Non-qualified RDMA requests are handled by normal RDMA protocol. For TID RDMA requests, hardware resources (hardware flow and TID entries) are allocated on the destination side (the requester side for TID RDMA READ and the responder side for TID RDMA WRITE). The information for these resources is conveyed to the data source side (the responder side for TID RDMA READ and the requester side for TID RDMA WRITE) and embedded in data packets. When data packets are received by the destination, hardware will deliver the data payload to the destination buffer without involving software and therefore improve the performance. Details ======= RDMA READ/WRITE requests are qualified by the following: - Total data length >= 256k; - Totoal data length is a multiple of 4K pages. Additional qualifications are enforced for the destination buffers: For RDMA RAED: - Each destination sge buffer is 4K aligned; - Each destination sge buffer is a multiple of 4K pages. For RDMA WRITE: - The destination number is 4K aligned. In addition, in an OPA fabric, some nodes may support TID RDMA while others may not. As such, it is important for two transaction nodes to exchange the information about the features they support. This discovery mechanism is called OPA Feature Negotion (OPFN) and is described in details in the patch series. Through OPFN, two nodes can find whether they both support TID RDMA and subsequently convert RDMA requests into TID RDMA requests. * hfi1-tid: (46 commits) IB/hfi1: Prioritize the sending of ACK packets IB/hfi1: Add static trace for TID RDMA WRITE protocol IB/hfi1: Enable TID RDMA WRITE protocol IB/hfi1: Add interlock between TID RDMA WRITE and other requests IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs IB/hfi1: Add the dual leg code IB/hfi1: Add the TID second leg ACK packet builder IB/hfi1: Add the TID second leg send packet builder IB/hfi1: Resend the TID RDMA WRITE DATA packets IB/hfi1: Add a function to receive TID RDMA RESYNC packet IB/hfi1: Add a function to build TID RDMA RESYNC packet IB/hfi1: Add TID RDMA retry timer IB/hfi1: Add a function to receive TID RDMA ACK packet IB/hfi1: Add a function to build TID RDMA ACK packet IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet IB/hfi1: Add a function to build TID RDMA WRITE DATA packet IB/hfi1: Add a function to receive TID RDMA WRITE response IB/hfi1: Add TID resource timer IB/hfi1: Add a function to build TID RDMA WRITE response IB/hfi1: Add functions to receive TID RDMA WRITE request ... Signed-off-by: Doug Ledford <dledford@redhat.com>
-
- 05 Feb, 2019 39 commits
-
-
Doug Ledford authored
Here is the final set of patches for TID RDMA. Again this is code which was previously submitted but re-organized so as to be easier to review. Similar to how the READ series was organized the patches to build, receive, allocate resources etc are broken out. For details on TID RDMA as a whole again refer to the original cover letter. https://www.spinics.net/lists/linux-rdma/msg66611.html * tid-write: (23 commits) IB/hfi1: Prioritize the sending of ACK packets IB/hfi1: Add static trace for TID RDMA WRITE protocol IB/hfi1: Enable TID RDMA WRITE protocol IB/hfi1: Add interlock between TID RDMA WRITE and other requests IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs IB/hfi1: Add the dual leg code IB/hfi1: Add the TID second leg ACK packet builder IB/hfi1: Add the TID second leg send packet builder IB/hfi1: Resend the TID RDMA WRITE DATA packets IB/hfi1: Add a function to receive TID RDMA RESYNC packet IB/hfi1: Add a function to build TID RDMA RESYNC packet IB/hfi1: Add TID RDMA retry timer IB/hfi1: Add a function to receive TID RDMA ACK packet IB/hfi1: Add a function to build TID RDMA ACK packet IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet IB/hfi1: Add a function to build TID RDMA WRITE DATA packet IB/hfi1: Add a function to receive TID RDMA WRITE response IB/hfi1: Add TID resource timer IB/hfi1: Add a function to build TID RDMA WRITE response IB/hfi1: Add functions to receive TID RDMA WRITE request ... Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
ACK packets are generally associated with request completion and resource release and therefore should be sent first. This patch optimizes the send engine by using the following policies: (1) QPs with RVT_S_ACK_PENDING bit set in qp->s_flags or qpriv->s_flags should have their priority incremented; (2) QPs with ACK or TID-ACK packet queued should have their priority incremented; (3) When a QP is queued to the wait list due to resource constraints, it will be queued to the head if it has ACK packet to send; (4) When selecting qps to run from the wait list, the one with the highest priority and starve_cnt will be selected; each priority will be equivalent to a fixed number of starve_cnt (16). Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA WRITE packets in IB header trace; 2. Adds trace events for various stages of the TID RDMA WRITE protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch enables TID RDMA WRITE protocol by converting a qualified RDMA WRITE request into a TID RDMA WRITE request internally: (1) The TID RDMA cability must be enabled; (2) The request must start on a 4K page boundary; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA WRITE requests are interleaved with TID RDMA READ requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; 4. WRITE-AFTER-WRITE. When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch integrates TID RDMA WRITE protocol into normal RDMA verbs framework. The TID RDMA WRITE protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA WRITE request into a TID RDMA WRITE request to avoid data copying on the responder side. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
The "Second Leg" of the TID RDMA WRITE protocol deals with the transfer of data and ack packets, which are in the KDETH PSN space, as opposed to the IB PSN space. Therefore, the Second Leg could be considered as a separate state machine. As such, it is handled by a different work queue item which is scheduled along with the normal IB state machine work item. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the TID packet builder for the responder side, which contains the state machine to build TID RDMA ACK packet for either TID RDMA WRITE DATA or TID RDMA RESYNC packets. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
To improve performance, the TID RDMA WRITE protocol is designed to own a second leg to send data and ack packets in the KDETH PSN space. This patch adds the packet builder for the requester side, which contains the state machine to build TID RDMA WRITE DATA and TID RDMA RESYNC packet. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the logic to resend TID RDMA WRITE DATA packets. The tracking indices will be reset properly so that the correct TID entries will be used. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds a function to receive TID RDMA RESYNC packet on the responder side. The QP's hardware flow will be updated and all allocated software flows will be updated accordingly in order to drop all stale packets. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds a function to build TID RDMA RESYNC packet, which is sent by the requester to notify the responder that no TID RDMA ACK packet has been received for a given KDETH PSN. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the TID RDMA retry timer to make sure that TID RDMA WRITE DATA packets for a segment are received successfully by the responder. This timer is generally armed when the last TID RDMA WRITE DATA packet for a segment is sent out and stopped when all TID RDMA DATA packets are acknowledged. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds a function to receive TID RDMA ACK packet, which could be an acknowledge to either a TID RDMA WRITE DATA packet or an TID RDMA RESYNC packet. For an ACK to TID RDMA WRITE DATA packet, the request segments are completed appropriately. For an ACK to a TID RDMA RESYNC packet, any pending segment flow information is updated accordingly. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds a function to build TID RDMA ACJ packet, which is also in the KDETH PSN space for packet ordering. This packet is used to acknowledge the receiving of all the TID RDMA WRITE DATA packets before the given KDETH PSN. Similar to RC ACK packets, TID RDMA ACK packets could also be coalesced. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds a function to receive TID RDMA WRITE DATA packet, which is in the KDETH PSN space in packet ordering. Due to the use of header suppression, software is generally only notified when the last data packet for a segment is received. This patch also adds code to handle KDETH EFLAGS errors for ingress TID RDMA WRITE DATA packets. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds a function to build TID RDMA WRITE DATA packet. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds a function to receive TID RDMA WRITE response. The TID entries will be stored for encoding TID RDMA WRITE DATA packet later. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the TID resource timer, which is used by the responder to free any TID resources that are allocated for TID RDMA WRITE request and not returned by the requester after a reasonable time. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the function to build TID RDMA WRITE response. The main role of the TID RDMA WRITE RESP packet is to send TID entries to the requester so that they can be used to encode TID RDMA WRITE DATA packet. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the functions to receive TID RDMA WRITE request. The request will be stored in the QP's s_ack_queue. This patch also adds code to handle duplicate TID RDMA WRITE request and a function to allocate TID resources for data receiving on the responder side. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
The s_ack_queue is managed by two pointers into the ring: r_head_ack_queue and s_tail_ack_queue. r_head_ack_queue is the index of where the next received request is going to be placed and s_tail_ack_queue is the entry of the request currently being processed. This works perfectly fine for normal Verbs as the requests are processed one at a time and the s_tail_ack_queue is not moved until the request that it points to is fully completed. In this fashion, s_tail_ack_queue constantly chases r_head_ack_queue and the two pointers can easily be used to determine "queue full" and "queue empty" conditions. The detection of these two conditions are imported in determining when an old entry can safely be overwritten with a new received request and the resources associated with the old request be safely released. When pipelined TID RDMA WRITE is introduced into this mix, things look very different. r_head_ack_queue is still the point at which a newly received request will be inserted, s_tail_ack_queue is still the currently processed request. However, with pipelined TID RDMA WRITE requests, s_tail_ack_queue moves to the next request once all TID RDMA WRITE responses for that request have been sent. The rest of the protocol for a particular request is managed by other pointers specific to TID RDMA - r_tid_tail and r_tid_ack - which point to the entries for which the next TID RDMA DATA packets are going to arrive and the request for which the next TID RDMA ACK packets are to be generated, respectively. What this means is that entries in the ring, which are "behind" s_tail_ack_queue (entries which s_tail_ack_queue has gone past) are no longer considered complete. This is where the problem is - a newly received request could potentially overwrite a still active TID RDMA WRITE request. The reason why the TID RDMA pointers trail s_tail_ack_queue is that the normal Verbs send engine uses s_tail_ack_queue as the pointer for the next response. Since TID RDMA WRITE responses are processed by the normal Verbs send engine, s_tail_ack_queue had to be moved to the next entry once all TID RDMA WRITE response packets were sent to get the desired pipelining between requests. Doing otherwise would mean that the normal Verbs send engine would not be able to send the TID RDMA WRITE responses for the next TID RDMA request until the current one is fully completed. This patch introduces the s_acked_ack_queue index to point to the next request to complete on the responder side. For requests other than TID RDMA WRITE, s_acked_ack_queue should always be kept in sync with s_tail_ack_queue. For TID RDMA WRITE request, it may fall behind s_tail_ack_queue. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
The TID RDMA WRITE protocol differs from normal IB RDMA WRITE in that TID RDMA WRITE requests do require responses, not just ACKs. Therefore, TID RDMA WRITE requests need to be treated as RDMA READ requests from the point of view of the QPs' s_ack_queue. In other words, the QPs' need to allow for TID RDMA WRITE requests to be stored in their s_ack_queue. However, because the user does not know anything about the TID RDMA capability and/or protocols, these extra entries in the queue cannot be advertized to the user. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the functions to build TID RDMA WRITE request. The work request opcode, packet opcode, and packet formats for TID RDMA WRITE protocol are also defined in this patch. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Doug Ledford authored
This is the series for adding TID RDMA read. Kaike put in a lot of effort into making this more consumable for review so special thanks to him. Allocating resources and tracing are separated out followed by patches which build up the read request. Then we have the patches to receive incoming TID RDMA read requests and handle integration with the RC protocol. See the cover letter of the original posting for more of a detailed overview of TID. https://www.spinics.net/lists/linux-rdma/msg66611.html * tid-read: IB/hfi1: Add static trace for TID RDMA READ protocol IB/hfi1: Enable TID RDMA READ protocol IB/hfi1: Add interlock between a TID RDMA request and other requests IB/hfi1: Integrate TID RDMA READ protocol into RC protocol IB/hfi1: Increment the retry timeout value for TID RDMA READ request IB/hfi1: Add functions for restarting TID RDMA READ request IB/hfi1: Add TID RDMA handlers IB/hfi1: Add functions to receive TID RDMA READ response IB/hfi1: Add a function to build TID RDMA READ response IB/hfi1: Add functions to receive TID RDMA READ request IB/hfi1: Set PbcInsertHcrc for TID RDMA packets IB/hfi1: Add functions to build TID RDMA READ request IB/hfi1: Add static trace for flow and TID management functions IB/hfi1: Add the counter n_tidwait IB/hfi1: TID RDMA RcvArray programming and TID allocation IB/hfi1: TID RDMA flow allocation IB/hfi: Move RC functions into a header file Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA READ packets in IB header trace; 2. Tracks qpriv->s_flags and iow_flags in qpsleepwakeup trace; 3. Adds a new event to track RC ACK receiving; 4. Adds trace events for various stages of the TID RDMA READ protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch enables TID RDMA READ protocol by converting a qualified RDMA READ request into a TID RDMA READ request internally: (1) The TID RDMA capability must be enabled; (2) The request must start on a 4K page boundary and all receiving buffers must start on 4K page boundaries; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Each receiving buffer length must be a multiple of 4K. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA READ/WRITE requests are interleaved with TID RDMA READ/WRITE requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch integrates the TID RDMA READ protocol into the IB RC protocol. This protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA READ request into a TID RDMA READ request to avoid data copying on the requester side. The following codes are added in this patch: - Send the TID RDMA READ request; - Complete the TID RDMA READ send request; - Send the TID RDMA READ response; - Complete the TID RDMA READ request; Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
The RC retry timeout value is based on the estimated time for the response packet to come back. However, for TID RDMA READ request, due to the use of header suppression, the driver is normally not notified for each incoming response packet until the last TID RDMA READ response packet. Consequently, the retry timeout value should be extended to cover the transaction time for the entire length of a segment (default 256K) instead of that for a single packet. This patch addresses the issue by introducing new retry timer functions to account for multiple packets and wrapper functions for backward compatibility. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds functions to retry TID RDMA READ request. Since TID RDMA READ request could be retried from any segment boundary, it requires a number of tracking fields in various structures and those fields should be reset properly. The qp->s_num_rd_atomic field is reset before retry and therefore should be incremented for each new or retried RDMA READ or atomic request. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This commit adds the TID RDMA READ pointers to the receiving opcode handlers. It also adds TID RDMA READ header sizes to header size table. A function to print the RHF EFLAGS errors is created so that it can be shared by both IB and TID RDMA receiving functions. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the functions to receive TID RDMA READ response. The TID resource information in the KDETH packet header will direct the hardware to deliver the packet payload to the user buffer automatically and the software will handle the packet header for the last packet of a segment as all other packet headers are suppressed by default. The TID entries will be freed when all packets for a segment have been received. This patch also adds the functions to handle KDETH eflag errors, including flow sequence and generation errors, when a TID RDMA READ response packet is received . The flow sequence error can be recovered by software checking of the flow sequence and will disappear when the hardware flow is programmed with a new generation number. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the function to build TID RDMA READ response packet. The previously received TID resource information will be used to build the KDETH packet, which will direct the delivery of packet payload by hardware. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the functions to receive TID RDMA READ request. The TID resource information will be stored and tracked. Duplicate request will also be handled properly. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
All TID RDMA packets are in KDETH packet format and therefore the PbcInsertHcrc must be set properly before sending the packet to hardware. Otherwise, the packets will be dropped by the receiver. By default, HCRC is not inserted for 9B packets without KDETH, and this patch adds that back for TID RDMA packets. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the helper functions to build the TID RDMA READ request on the requester side. The key is to allocate TID resources (TID flow and TID entries) and send the resource information to the responder side along with the read request. Since the TID resources are limited, each TID RDMA READ request has to be split into segments with a default segment size of 256K. A software flow is allocated to track the data transaction for each segment. The work request opcode, packet opcode, and packet formats for TID RDMA READ protocol are also defined in this patch. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the static trace for the flow and TID management functions to help debugging in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Kaike Wan authored
This patch adds the counter n_tidwait to count the number of times the TID resource allocator has to wait for TID resources. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-