• Eric Ren's avatar
    ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock · 439a36b8
    Eric Ren authored
    We are in the situation that we have to avoid recursive cluster locking,
    but there is no way to check if a cluster lock has been taken by a precess
    already.
    
    Mostly, we can avoid recursive locking by writing code carefully.
    However, we found that it's very hard to handle the routines that are
    invoked directly by vfs code.  For instance:
    
      const struct inode_operations ocfs2_file_iops = {
          .permission     = ocfs2_permission,
          .get_acl        = ocfs2_iop_get_acl,
          .set_acl        = ocfs2_iop_set_acl,
      };
    
    Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
    
      do_sys_open
       may_open
        inode_permission
         ocfs2_permission
          ocfs2_inode_lock() <=== first time
           generic_permission
            get_acl
             ocfs2_iop_get_acl
      	ocfs2_inode_lock() <=== recursive one
    
    A deadlock will occur if a remote EX request comes in between two of
    ocfs2_inode_lock().  Briefly describe how the deadlock is formed:
    
    On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
    BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
    the remote EX lock request.  Another hand, the recursive cluster lock
    (the second one) will be blocked in in __ocfs2_cluster_lock() because of
    OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
    there is no chance for the first cluster lock on this node to be
    unlocked - we block ourselves in the code path.
    
    The idea to fix this issue is mostly taken from gfs2 code.
    
    1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track
       of the processes' pid who has taken the cluster lock of this lock
       resource;
    
    2. introduce a new flag for ocfs2_inode_lock_full:
       OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for
       us if we've got cluster lock.
    
    3. export a helper: ocfs2_is_locked_by_me() is used to check if we have
       got the cluster lock in the upper code path.
    
    The tracking logic should be used by some of the ocfs2 vfs's callbacks,
    to solve the recursive locking issue cuased by the fact that vfs
    routines can call into each other.
    
    The performance penalty of processing the holder list should only be
    seen at a few cases where the tracking logic is used, such as get/set
    acl.
    
    You may ask what if the first time we got a PR lock, and the second time
    we want a EX lock? fortunately, this case never happens in the real
    world, as far as I can see, including permission check,
    (get|set)_(acl|attr), and the gfs2 code also do so.
    
    [sfr@canb.auug.org.au remove some inlines]
    Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.comSigned-off-by: default avatarEric Ren <zren@suse.com>
    Reviewed-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
    Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Mark Fasheh <mfasheh@versity.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    439a36b8
dlmglue.h 6.32 KB