staging/rdma/hfi1: Prevent silent data corruption with user SDMA
User SDMA keeps track of progress into the submitted IO vectors by tracking an offset into the vectors when packets are submitted. This offset is updated after a successful submission of a txreq to the SDMA engine. The same offset was used when determining whether an IO vector should be 'freed' (pages unpinned) in the SDMA callback functions. This was causing a silent data corruption in big jobs (> 2 nodes, 120 ranks each) on the receive side because the send side was mistakenly unpinning the vector pages before the HW has processed all descriptors referencing the vector. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Showing
Please register or sign in to comment