1. 09 Sep, 2010 2 commits
    • Tina Yang's avatar
      RDS: Fix corrupted rds_mrs · 35b52c70
      Tina Yang authored
      On second look at this bug (OFED #2002), it seems that the
      collision is not with the retransmission queue (packet acked
      by the peer), but with the local send completion.  A theoretical
      sequence of events (from time t0 to t3) is thought to be as
      follows,
      
      Thread #1
      t0:
          sock_release
          rds_release
          rds_send_drop_to /* wait on send completion */
      t2:
          rds_rdma_drop_keys()   /* destroy & free all mrs */
      
      Thread #2
      t1:
          rds_ib_send_cq_comp_handler
          rds_ib_send_unmap_rm
          rds_message_unmapped   /* wake up #1 @ t0 */
      t3:
          rds_message_put
          rds_message_purge
          rds_mr_put   /* memory corruption detected */
      
      The problem with the rds_rdma_drop_keys() is it could
      remove a mr's refcount more than its due (i.e. repeatedly
      as long as it still remains in the tree (mr->r_refcount > 0)).
      Theoretically it should remove only one reference - reference
      by the tree.
      
              /* Release any MRs associated with this socket */
              while ((node = rb_first(&rs->rs_rdma_keys))) {
                      mr = container_of(node, struct rds_mr, r_rb_node);
                      if (mr->r_trans == rs->rs_transport)
                              mr->r_invalidate = 0;
                      rds_mr_put(mr);
              }
      
      I think the correct way of doing it is to remove the mr from
      the tree and rds_destroy_mr it first, then a rds_mr_put()
      to decrement its reference count by one.  Whichever thread
      holds the last reference will free the mr via rds_mr_put().
      Signed-off-by: default avatarTina Yang <tina.yang@oracle.com>
      Signed-off-by: default avatarAndy Grover <andy.grover@oracle.com>
      35b52c70
    • Andy Grover's avatar
      RDS: Fix BUG_ONs to not fire when in a tasklet · 9e2effba
      Andy Grover authored
      in_interrupt() is true in softirqs. The BUG_ONs are supposed
      to check for if irqs are disabled, so we should use
      BUG_ON(irqs_disabled()) instead, duh.
      Signed-off-by: default avatarAndy Grover <andy.grover@oracle.com>
      9e2effba
  2. 07 Sep, 2010 38 commits