Commit bca9fc14 authored by Jeff Layton's avatar Jeff Layton Committed by Ilya Dryomov

ceph: when filling trace, call ceph_get_inode outside of mutexes

Geng Jichao reported a rather complex deadlock involving several
moving parts:

1) readahead is issued against an inode and some of its pages are locked
   while the read is in flight

2) the same inode is evicted from the cache, and this task gets stuck
   waiting for the page lock because of the above readahead

3) another task is processing a reply trace, and looks up the inode
   being evicted while holding the s_mutex. That ends up waiting for the
   eviction to complete

4) a write reply for an unrelated inode is then processed in the
   ceph_con_workfn job. It calls ceph_check_caps after putting wrbuffer
   caps, and that gets stuck waiting on the s_mutex held by 3.

The reply to "1" is stuck behind the write reply in "4", so we deadlock
at that point.

This patch changes the trace processing to call ceph_get_inode outside
of the s_mutex and snap_rwsem, which should break the cycle above.

[ idryomov: break unnecessarily long lines ]

URL: https://tracker.ceph.com/issues/47998Reported-by: default avatarGeng Jichao <gengjichao@jd.com>
Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
Reviewed-by: default avatarLuis Henriques <lhenriques@suse.de>
Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
parent 6646ea1c
...@@ -1315,15 +1315,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) ...@@ -1315,15 +1315,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
} }
if (rinfo->head->is_target) { if (rinfo->head->is_target) {
tvino.ino = le64_to_cpu(rinfo->targeti.in->ino); /* Should be filled in by handle_reply */
tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid); BUG_ON(!req->r_target_inode);
in = ceph_get_inode(sb, tvino);
if (IS_ERR(in)) {
err = PTR_ERR(in);
goto done;
}
in = req->r_target_inode;
err = ceph_fill_inode(in, req->r_locked_page, &rinfo->targeti, err = ceph_fill_inode(in, req->r_locked_page, &rinfo->targeti,
NULL, session, NULL, session,
(!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags) && (!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags) &&
...@@ -1333,13 +1328,13 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) ...@@ -1333,13 +1328,13 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
if (err < 0) { if (err < 0) {
pr_err("ceph_fill_inode badness %p %llx.%llx\n", pr_err("ceph_fill_inode badness %p %llx.%llx\n",
in, ceph_vinop(in)); in, ceph_vinop(in));
req->r_target_inode = NULL;
if (in->i_state & I_NEW) if (in->i_state & I_NEW)
discard_new_inode(in); discard_new_inode(in);
else else
iput(in); iput(in);
goto done; goto done;
} }
req->r_target_inode = in;
if (in->i_state & I_NEW) if (in->i_state & I_NEW)
unlock_new_inode(in); unlock_new_inode(in);
} }
......
...@@ -3179,6 +3179,23 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg) ...@@ -3179,6 +3179,23 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features); err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features);
mutex_unlock(&mdsc->mutex); mutex_unlock(&mdsc->mutex);
/* Must find target inode outside of mutexes to avoid deadlocks */
if ((err >= 0) && rinfo->head->is_target) {
struct inode *in;
struct ceph_vino tvino = {
.ino = le64_to_cpu(rinfo->targeti.in->ino),
.snap = le64_to_cpu(rinfo->targeti.in->snapid)
};
in = ceph_get_inode(mdsc->fsc->sb, tvino);
if (IS_ERR(in)) {
err = PTR_ERR(in);
mutex_lock(&session->s_mutex);
goto out_err;
}
req->r_target_inode = in;
}
mutex_lock(&session->s_mutex); mutex_lock(&session->s_mutex);
if (err < 0) { if (err < 0) {
pr_err("mdsc_handle_reply got corrupt reply mds%d(tid:%lld)\n", mds, tid); pr_err("mdsc_handle_reply got corrupt reply mds%d(tid:%lld)\n", mds, tid);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment