rbd: fix safety of rbd_put_client()

The rbd_client structure uses a kref to arrange for cleaning up and freeing an instance when its last reference is dropped. The cleanup routine is rbd_client_release(), and one of the things it does is delete the rbd_client from rbd_client_list. It acquires node_lock to do so, but the way it is done is still not safe. The problem is that when attempting to reuse an existing rbd_client, the structure found might already be in the process of getting destroyed and cleaned up. Here's the scenario, with "CLIENT" representing an existing rbd_client that's involved in the race: Thread on CPU A | Thread on CPU B --------------- | --------------- rbd_put_client(CLIENT) | rbd_get_client() kref_put() | (acquires node_lock) kref->refcount becomes 0 | __rbd_client_find() returns CLIENT calls rbd_client_release() | kref_get(&CLIENT->kref); | (releases node_lock) (acquires node_lock) | deletes CLIENT from list | ...and starts using CLIENT... (releases node_lock) | and frees CLIENT | <-- but CLIENT gets freed here Fix this by having rbd_put_client() acquire node_lock. The result could still be improved, but at least it avoids this problem. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>

rbd: fix safety of rbd_put_client()
The rbd_client structure uses a kref to arrange for cleaning up and freeing an instance when its last reference is dropped. The cleanup routine is rbd_client_release(), and one of the things it does is delete the rbd_client from rbd_client_list. It acquires node_lock to do so, but the way it is done is still not safe. The problem is that when attempting to reuse an existing rbd_client, the structure found might already be in the process of getting destroyed and cleaned up. Here's the scenario, with "CLIENT" representing an existing rbd_client that's involved in the race: Thread on CPU A | Thread on CPU B --------------- | --------------- rbd_put_client(CLIENT) | rbd_get_client() kref_put() | (acquires node_lock) kref->refcount becomes 0 | __rbd_client_find() returns CLIENT calls rbd_client_release() | kref_get(&CLIENT->kref); | (releases node_lock) (acquires node_lock) | deletes CLIENT from list | ...and starts using CLIENT... (releases node_lock) | and frees CLIENT | <-- but CLIENT gets freed here Fix this by having rbd_put_client() acquire node_lock. The result could still be improved, but at least it avoids this problem. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
d23a4b3f · Alex Elder · Sage Weil · 97bb59a0 · d23a4b3f
Commit d23a4b3f authored Jan 29, 2012 by Alex Elder Committed by Sage Weil Feb 02, 2012
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

drivers/block/rbd.c drivers/block/rbd.c +4 -2

No files found.
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -407,15 +407,15 @@ static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,

 /*
 * Destroy ceph client
+ *
+ * Caller must hold node_lock.
 */
 static void rbd_client_release(struct kref *kref)
 {
 	struct rbd_client *rbdc = container_of(kref, struct rbd_client, kref);

 	dout("rbd_release_client %p\n", rbdc);
-	spin_lock(&node_lock);
 	list_del(&rbdc->node);
-	spin_unlock(&node_lock);

 	ceph_destroy_client(rbdc->client);
 	kfree(rbdc->rbd_opts);
@@ -428,7 +428,9 @@ static void rbd_client_release(struct kref *kref)
 */
 static void rbd_put_client(struct rbd_device *rbd_dev)
 {
+	spin_lock(&node_lock);
 	kref_put(&rbd_dev->rbd_client->kref, rbd_client_release);
+	spin_unlock(&node_lock);
 	rbd_dev->rbd_client = NULL;
 	rbd_dev->client = NULL;
 }